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1.  OVERVIEW 


Since  the  late  1970s,  many  applications  (including  mission-critical  ones)  were  recognized  as  suit¬ 
able  for  distributed  computing.  The  benefits  of  such  data  processing  systems  include  high  perform¬ 
ance,  high  availability,  high  reliability,  resource  sharing,  high  adaptability  to  workload  changes,  and 
modular,  incremental  growth. 

Some  applications  require  distributed  computing  because  of  their  inherent  nature.  For  mission- 
critical  applications  where  high  availability  (achieved  through  fault  tolerance)  is  the  foremost  con¬ 
cern,  redundancy  of  hardware,  software,  and  data  is  required;  and  preferably,  these  resources  are 
widely  distributed. 

To  the  user,  a  distributed  computing  system  behaves  like  a  “virtual  uni-processor,”  which  was 
how  such  a  system  was  described  in  the  1970s.  With  the  steady  advent  of  communication  infra¬ 
structure  in  the  last  two  decades,  a  metacomputer,  currently  called  a  virtual  shared-computing  ma¬ 
chine,  can  compute  across  a  wide  area,  and  even  across  continent(s).  This  “virtual  supercomputer” 
provides  a  unified  system  image,  making  all  computing  resources  transparently  available  to  users. 

A  computational  grid  is  a  recently  emerging  concept  analogous  to  an  electrical  power  grid,  offer¬ 
ing  potential  access  to  computational  resources  in  a  ubiquitous,  inexpensive,  and  dependable  man¬ 
ner.  This  fresh  point-of-view  emphasizes  the  vastness  of  the  future  computing  environment,  and  the 
need  for  infrastructure  as  well  as  the  far-reaching  consequences. 

Today,  computational  grids  do  not  exist.  However,  there  has  been  extensive  research  over  the  last 
5  years  in  wide-area,  heterogeneous,  distributed  computing  thanks  to  technology  advances  in 
microprocessors  and  interconnection  networking.  Widespread  experimentation  of  large-scale  dis¬ 
tributed-computing  testbeds  has  been  initiated  and  operated  at  government  and  academic  research 
laboratories.  The  main  thrusts  are  to  design,  implement,  and  deploy  various  building  blocks  for  the 
grid  environment,  and  to  explore  enhanced  capabilities  of  existing  applications  and  new  capabilities 
of  promising  grid-enabled  applications. 

This  document  provides  an  overview  of  recent  developments  in  several  important  aspects  of  wide- 
area,  heterogeneous,  distributed  computing.  Section  2  presents  information  on  affordable,  high- 
performance,  commodity  computing  clusters,  a  building  block  of  computational  grids.  The  rapid 
advent  of  mass-market,  off-the-shelf  microprocessors  and  high-speed  networking  has  brought  com¬ 
modity  cluster  computing  into  focus  as  an  affordable  high-performance  computing  alternative  for 
testbed  development  and  deployment.  Conunodity  clusters  delivered  scalable,  high  and 
sustained  performance  at  an  affordable  cost  at  the  Supercomputing  Conference  SC’ 97  in  San  Jose, 
CA,  by  co-winning  a  Gordon  Bell  Prize  for  price/performance.  A  real-world,  large-scale  example  is 
the  Whitney  project  of  the  Numerical  Aerospace  Simulation  (NAS)  Systems  Division  at  NASA 
Ames  Research  Center.  This  project  is  integrating  off-the-shelf  hardware  and  software  technologies 
to  build  a  cluster  of  hundreds  to  thousands  of  nodes  supporting  scientific  workload.  These  develop¬ 
ments  illustrate  the  potential  of  commodity  clusters  as  a  building  block  of  computational  grids.  The 
lack  of  affordable,  scalable,  interface4s  for  network  interconnects  is  one  major  challenge  to  com¬ 
modity  clusters.  The  industry-standard  Virtual  Interface  Architecture  (VIA)  spearheaded  by  Intel, 
Compaq,  and  Microsoft,  with  contributions  of  over  100  industry  and  research  organizations, 
recently  addressed  this  problem.  VIA  will  be  immediately  applied  in  the  recently  ratified  IEEE- 
standard  Gigabit  Ethernet  technology.  The  coupling  of  VIA  and  the  emerging  Gigabit  Ethernet  may 
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become  the  dominant  force  in  short-haul  cluster  networking  and,  in  turn,  may  provide  the  path  to 
low-cost,  high-performance,  scalable  commodity  clusters. 

Workload-management  software  (WMS)  is  another  building  block  of  computational  grids.  Essen¬ 
tially,  WMS  performs  distributed,  network-wide,  job  management  by  dynamically  scheduling  re¬ 
sources  to  virtually  transform  a  network  of  heterogeneous  platforms  into  a  single,  shared-computing 
machine.  The  four  main  requirements  for  WMS  are  network  and  resource  awareness,  load  balancing, 
fault  tolerance,  and  security. 

Section  3  discusses  the  current  state  of  WMS  by  identifying  leading  WMS  packages  and  describes 
the  most  desirable  features  in  detail.  These  packages  have  been  used  in  massively  parallel  systems  as 
well  as  in  loosely  coupled  clusters.  However,  the  current  incapability  of  these  packages  is  well- 
recognized.  Of  immediate  importance  is  the  lack  of  WMS  interoperability.  In  a  computational  grid 
environment,  resources  are  heterogeneous,  located  across  many  sites,  and  under  different  adminis¬ 
trative  domains.  Most  likely,  resources  are  also  under  different  WMS  packages.  The  lack  of  WMS 
interoperability  is  an  immediate  problem  for  computational  grids. 

There  are  two  ways  to  achieve  WMS  interoperability.  One  is  through  standardization.  The 
PSCHED  initiative,  a  recent  standards  effort,  has  not  been  successful  because  principal  vendors  have 
showed  little  interest.  The  other  way  to  arrive  at  interoperability  is  to  devise  a  high-level  resource- 
allocation  manager  to  interface  with  local  WMS  packages.  The  Globus  project  adopted  the  latter 
approach  to  build  a  grid  resource-management  architecture.  This  development  is  described  in  section 
4. 


The  computational  grid  environment  also  motivates  and  enables  new  programming  models  and 
services.  The  main  topic  of  section  4  is  agent-based  programming  models.  Essentially,  software 
agents  (or  simply,  agents)  are  software  components  that  reside  in  a  computing  environment  and  are 
tasked  to  perform  certain  services.  While  a  stationary  agent  can  only  run  on  one  machine,  a  mobile 
agent  can  roam  from  one  machine  to  another.  Mobile  agents  could  provide  better  support  for  mobile 
computing.  Mobile  platforms-such  as  laptop,  notebook,  and  palm-sized  computers  as  well  as  hand¬ 
held  personal  digital  assistants  (PDAs)-are  characterized  by  limited  storage  and  processing  capacity, 
limited-capacity  software  and  runtime  environment,  a  low-bandwidth,  high-latency  connection,  and 
a  long-period  disconnection.  Since  mobile  agents  can  migrate  and  access  resources  in  the  network, 
they  do  not  need  constant  connection  to  home  platforms.  Upon  completing  their  tasks,  mobile  agents 
send  results  back  to  the  home  platforms  or  temporarily  store  results  on  a  docking  mechanism  in  case 
of  disconnection.  This  operational  concept  facilitates  mobile  computing,  thus  enlarging  the  need  and 
capability  of  computational  grids.  Alongside  a  growing  number  of  commercial  applications,  the  role 
of  mobile  computing  in  C^I  is  well-recognized,  especially  with  the  vision  of  small,  mobile,  adapt¬ 
able,  quick-response  command  structures  of  the  future. 

Agent-based  programming  has  implications  on  WMS.  WMS  can  be  agent-based;  that  is,  WMS 
functionality  can  be  performed  by  system-level  agents.  An  example  is  the  NetSolve  scheduling 
agent(s)  that  can  provide  access  to  network  resources  for  numerical-computing  execution  in  a  grid 
environment.  However,  the  scheduling  functionality  can  be  assumed  by  application-level  agent(s) 
for  the  user’s  application.  In  the  Application-Level  Scheduler  (AppLeS),  the  scheduling  agent  is  a 
stationary  agent  that  coordinates  four  subsystems  to  select  resources,  to  generate  resource-dependent 
schedule,  to  implement  the  best  schedule,  and  to  monitor  and  predict  network  and  resource  perform¬ 
ance.  Other  examples  are  mobile-agent  models  that  either  let  mobile  agents  autonomously  seek  re¬ 
sources  and  migrate,  or  use  model-based  mechanisms  (for  example,  a  microeconomic,  market- 
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oriented  approach)  for  resource  control.  In  general,  to  achieve  efficient  end-to-end  resource  utiliza¬ 
tion  requires  coordination  between  system-level  scheduling  and  application-level  scheduling  where 
an  application-level  scheduler  can  specify  the  application’s  performance  requirements. 

Since  there  are  many  applications  running  simultaneously  in  a  grid  environment,  application-level 
agents  need  policy  and  mechanisms  for  cooperation  or  the  environment  is  subject  to  chaotic  and 
dangerous  behavior  as  well  as  a  waste  of  resources.  This  is  a  crucial  issue  when  thousands  or  even 
millions  of  agents  are  deployed  in  a  computational  grid  environment.  For  mobile  agents,  another 
challenge  is  security.  In  particular,  while  agent  transfer  is  essentially  a  multi-hop  operation,  multi¬ 
hop  authentication  is  unavailable.  The  mobility  of  mobile-agents  increase  the  need  to  protect  not 
only  resources  (for  example,  platfomas  and  databases)  against  agents,  but  to  also  protect  agents 
against  resources  that  can  be  programmed  to  compromise  or  even  terminate  the  agents.  When  de¬ 
ploying  mobile-agent-enabled  applications  in  a  closed  grid  environment,  inadvertent  damages  are 
likely  to  occur  through  buggy  software.  The  impacts  on  system  security  and  integrity,  however,  are 
not  unlike  those  that  result  from  malicious  attacks. 

Building  computing  infrastructure  for  future  grids  is  an  ongoing  process.  Obviously,  this  is  a  sig¬ 
nificant  activity  with  far-reaching  impact  in  shaping  the  21"  century.  For  military  applications,  its 
implications  touch  many  facets  of  future  capabilities  and  missions. 
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2.  MASS-MARKET,  COMMODITY  OFF-THE-SHELF 
COMPUTING  CLUSTERS 


2.1  INTRODUCTION 

The  deployment,  at  Sandia  National  Laboratories,  of  the  Intel  ASCI-Red  massively  parallel  plat¬ 
form  of  the  DoE  Accelerated  Strategic  Computing  Initiative  (ASCI)  project  has  aroused  general  in¬ 
terest  in  high-performance  systems  built  from  commodity,  commercial  off-the-shelf  hardware.  The 
Intel  ASCI-Red,  the  first  teraflops  (trillion  floating-point  operations  per  second)  computer,  was  as¬ 
sembled  with  more  than  9000  commodity  200-MHz  Pentium  Pro  processors.  Only  the  networking 
components  of  the  system  are  proprietary  and  custom-built  to  guarantee  high-performance  intemode 
communications.  The  recent  drop  in  prices  of  fast  Ethernet  (100  Mbps)  networking  hardware  allows 
clusters  up  to  hundreds  of  PCs  assembled  using  available  off-the-shelf  CPUs,  either  the  200-MHz 
Pentium  Pro  processors  or  the  faster  Pentium  11  processors,  to  function  as  a  high-performance  com¬ 
puting  platform.  These  commodity  clusters  are  affordable,  usable,  reliable,  easy  to  build,  maintain, 
and  upgrade,  and  above  all,  show  good  price/performance.  Furthermore,  commodity  clusters  provide 
quick-to-set-up  prototypes  and  testbeds.  For  these  reasons,  a  growing  number  of  academic  and  gov¬ 
ernment  computing  and  research  laboratories  are  using  commodity  clusters.  Their  popularity  is  en¬ 
hanced  by  the  capability  demonstrated  at  the  Supercomputing  Conference  SC’ 97  (San  Jose,  CA, 
November  1997)  that  captured  a  prestigious  Gordon  Bell  Prize  for  price/performance.  While  tradi¬ 
tional  commodity  clusters  adopt  the  Linux  operating  system,  there  is  an  emerging  trend  using  Win¬ 
dows  NT.  In  April  1998,  the  High  Performance  Virtual  Machine  (HPVM)  project  at  the  University 
of  Illinois,  Urbana-Champaign  (UIUC),  under  the  sponsorship  of  the  Defense  Advanced  Research 
Projects  Agency  (DARPA),  successfully  demonstrated  a  Windows  NT  cluster  of  256  (300-MHz 
Pentium  11)  processors  using  128  dual-processor  nodes. 

In  section  2,  we  describe  the  hardware  and  software  of  a  typical  commodity  cluster,  its  perform¬ 
ance,  advantages,  and  disadvantages.  A  list  of  selected  commodity-cluster  projects,  Linux-based  and 
NT-based,  is  provided.  We  also  discuss  the  major  technical  challenge  to  communication  perform¬ 
ance  in  cluster  computing,  namely,  that  except  for  large-volume  data  transfers,  many  other  applica¬ 
tions  are  unable  to  take  advantage  of  the  increased  network  bandwidth  unless  host  protocol  over¬ 
heads  are  lowered.  Additionally,  we  discuss  the  emerging  Virtual  Interface  Architecture  (VIA)  clus¬ 
ter-network  standard  and  its  approaching  impact  on  the  role  of  commodity-cluster  computing  in 
high-performance  parallel  and  distributed  computing. 

2.2  HARDWARE 

Currently,  the  typical  hardware  of  a  commodity  cluster  is  based  on  the  Intel  200-MHz  Pentium 
Pro  processor  because  of  its  low  price.  However,  there  are  clusters  built  with  300-MHz  Pentium  11 
processors  (e.g.,  HPVM  at  UIUC  and  Aeneas  at  UC  Irvine).  Pentium  Il-based  clusters  are  quickly 
becoming  conunon  because  their  price/performance  becomes  affordable. 

A  typical  commodity  cluster  is  usually  configured  with  one  node  serving  as  the  front-end  and  the 
rest  as  computing  nodes.  Each  node  typically  has  one  CPU,  128-MB  RAM,  a  hard  disk  (2.5  GB  or 
more),  a  floppy  drive,  one  SVGA  adapter,  and  one  Fast-Ethemet  adapter.  The  front-end  might  have  a 
larger  capacity  and/or  more  hard  disks  for  storage  purposes,  if  desired.  A  monitor  and  a  keyboard  are 
shared  among  all  PCs  using  video/keyboard  switch  boxes.  Note  that  the  SVGA  adapters  on  the  com- 
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puting  nodes  are  for  diagnostic  purpose  only,  and  thus,  do  not  require  the  many  features  that  might 
be  needed  on  the  front-end  for  multimedia  capability.  Furthermore,  the  front-end  might  have  extra 
Fast-Ethernet  adapter(s)  to  link  with  the  outside  world. 

The  networking  of  a  typical  commodity  cluster  is  Fast-Ethernet  technology  that  has  the  peak  data 
rate  of  100  Mbps  (mega  bits  per  second).  Fast-Ethernet  hardware,  such  as  network-interface  cards 
(NICs),  hubs,  and  switches,  has  become  a  mass-market,  off-the-shelf  commodity  sold  at  reasonable 
prices  because  of  the  migration  of  standard  (10  Mbps)  Ethernet  technology  to  the  Fast-Ethernet  one. 
Most  vendors  no  longer  support  two  separate  product  lines,  one  for  standard-Ethernet  adapters  and 
the  other  for  Fast-Ethernet  adapters.  To  ease  the  hardware  migration  path,  vendors  are  now  offering 
10/100  Ethernet  adapters  using  autosensing  technology  to  run  at  10  Mbps  or  100  Mbps  according  to 
the  speed  of  the  hub  or  switch  port.  As  of  September  1998,  top-of-the-line  10/100  PCI  Ethernet 
adapters  are  priced  under  $70  each. 

Network  topology  for  a  typical  commodity  cluster  is  either  switch-based  for  a  small  cluster  (of  16 
nodes,  for  example),  or  based  on  a  combination  of  switches  and  hubs  for  a  large  cluster  up  to  hun¬ 
dreds  (and  possibly  thousands)  of  nodes.  Clearly,  a  configuration  such  as  a  2D  Torus  has  better  net¬ 
work  performance;  however,  because  each  node  needs  more  network-adapter  cards,  the  routing  gets 
too  copious  to  be  practical  for  a  network  of  reasonable  size. 

An  alternative  to  off-the-shelf,  commodity,  Fast-Ethernet  technology  is  more  expensive  gigabit- 
per-second  networking  such  as  the  proprietary  Myrinet  from  Myricom  (http://www.myri.com).  My- 
rinet  is  a  family  of  commercial  products  that  originated  from  research  projects  funded  by  DARPA  at 
the  California  Institute  of  Technology  and  the  University  of  Southern  California.  Currently,  the  peak 
data  rate  of  the  Myrinet  adapters  and  switches  is  1.28  Gbps,  or  2.56  Gbps  in  full  duplex.  Another 
possibility  is  the  emerging  Gigabit  Ethernet  technology;  its  standard  over  fiber  was  ratified  by  IEEE 
in  June  1998.  The  standardization  for  Gigabit  Ethernet  over  copper  is  expected  to  be  completed  by 
mid-1999.  It  is  believed  that  as  soon  as  standardized  products  are  available.  Gigabit  Ethernet  will 
quickly  become  the  widely  adopted  technology  for  short-haul  networking. 

We  end  this  section  by  noting  that  there  is  a  nine-page  tutorial  that  describes  how  to  build  a  16- 
node  commodity  cluster  such  as  an  earlier  configuration  used  at  at  the  Calfomia  Institute  of  Tech¬ 
nology  Center  of  Advanced  Computing  Research  (Lindheim,  1997).  Figure  1,  taken  from  this  tuto¬ 
rial  with  the  author’s  permission,  depicts  a  system  setup  of  a  16-node  commodity  cluster  using  a 
Fast-Ethernet  switch. 

2.3  SOFTWARE 

Traditionally,  starting  with  the  very  first  commodity  cluster  (called  a  Beowulf )  assembled  at 
NASA  Goddard  Space  Flight  Center  in  summer  1994  using  Intel  DX2  processors  at  50  MHz,  com¬ 
modity  clusters  have  favored  Linux  to  leverage  the  availability  of  compilers  and  utilities. 

Linux  is  a  Unix  clone  operating  system  used  primarily  for  the  Intel  processors.  Eventually,  many 
GNU  compilers  and  utilities  were  ported  to  Linux.  Common  compilers  and  tools  include: 

•  GNU  C,  C-t-+  and  FORTRAN  77  compilers, 

•  XI 1  with  development  libraries,  including  windows  managers,  such  as  fvwm  and  its 
Windows  95-emulated  version,  fvwm95, 

•  Networking  software,  including  NIC  drivers. 
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•  Message-passing  libraries  such  as  MPICH  and  PVM, 

•  Portable  Batch  System  (PBS)  for  cluster-queuing  system,  and  Extensible  Argonne 
Scheduling  sYstem  (EASY)  for  scheduling. 


Figure  1 .  System  setup  of  a  16-node  commodity  cluster  using  a  Fast-Ethernet  switch. 


The  Linux  kernel  and  all  of  these  compilers  and  utilities  are  free.  Vendors  package  the  Linux  ker¬ 
nel,  compilers,  and  various  utilities  together  with  their  installation  applications  into  so-called  distri¬ 
bution  and  sell  at  a  nominal  price,  usually  under  $50.  Intended  for  stand-alone  PCs,  none  of  these 
distributions  include  message  passing  libraries  (such  as  MPICH  and  PVM)  and  load-management 
software  (such  as  PBS  and  EASY);  however,  these  packages  can  be  obtained  directly  from  Internet 
sites.  The  most  widely  used  distribution  is  Red  Hat  Linux  from  Red  Hat  Software,  Inc. 
(http://www.redhat.com/).  This  distribution  includes  the  Red  Hat  Package  Manager  (RPM)  to  facili¬ 
tate  software  installing  and  upgrading.  Red  Hat  Linux  5.1  was  released  on  1  June  1998.  Version  5.2, 
due  in  November  1998,  will  support  symmetric  multiprocessing  (SMP). 

Some  of  the  bundled  compilers  and  utilities,  however,  are  not  high-quality  products.  For  example, 
while  GNU  C  and  C++  compilers  are  respectable,  the  GNU  FORTRAN  77  (g77)  compiler  is  sub¬ 
standard.  For  this  reason,  many  commodity-cluster  sites  choose  to  use  PGF77,  the  Portland  Group 
FORTRAN  77  compiler,  instead  (Fineburg  and  Pedretti,  1997).  The  Portland  Group 
(http://www.pgroup.com)  that  built  compilers  for  the  Intel  Paragon  and  the  current  Intel  ASCI-Red 
computer,  has  ported  to  Linux  its  C,  C++,  FORTRAN  77,  Fortran  90,  and  HPF  (High  Performance 
Fortran)  compilers  as  well  as  the  parallel-application  profiler  and  debugger.  These  compilers  and 
tools  are  optimized  for  the  Intel  Pentium  Pro  and,  most  recently,  Pentium  II  processors. 

Meanwhile,  there  are  signs  of  the  emergence  of  Windows  NT-based  commodity  clusters  because 
the  most  important  building  blocks  are  in  place.  These  include: 
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•  MPICH/NT,  a  port  of  the  MPICH  to  Windows  NT,  currently  available  from  University  of  Mis¬ 
sissippi  (Hebert  and  Skjellum,  1997), 

•  PVM  support  for  Windows  NT  and  Windows  95  (Geist,  1997), 

•  The  leading  load-management  software  LSF  (Load  Sharing  Facility),  from  Platform  Computing 
(http://www.platform.com),  supports  Windows  NT, 

•  The  Portland  Group  now  offers  compilers  (C,  F77,  F90,  and  HPF)  and  performance  profiler  tool 
for  Windows  NT. 

Several  cluster-computing  projects  have  been  using  Windows  NT  4.0.  Notably,  the  HPVM-III 
cluster  at  UIUC,  consisting  of  128  dual -processor  nodes  (using  300-MHz  Pentium  II  processors)  was 
successfully  demonstrated  in  April  1998.  HPVM-in  has  been  using  LSF  for  cluster  managing. 

Because  of  difficulties  created  by  the  small  market  share  of  Linux,  it  is  strongly  believed  that 
commodity-cluster  computing  will  have  a  far-reaching  impact  in  high-performance  parallel  and  dis¬ 
tributed  computing  if  it  adopts  a  capable  operating  system  with  a  major  market  share  such  as  Win¬ 
dows  NT. 

2.4  PRICE/PERFORMANCE 

The  main  goal  of  commodity  clusters  is  not  performance  per  se,  but  price/performance,  that  is,  to 
get  better  performance  while  spending  fewer  dollars.  So  far,  commodity  clusters  have  achieved  this 
goal.  Two  events  support  the  argument: 

•  A  Gordon  Bell  Prize  for  price/performance  was  awarded  to  a  team  for  their  results  using  an  Intel 
200-MHz  Pentium  Pro  commodity  cluster  at  the  Supercomputing  Conference  SC’97  on 

21  November  1997  (Karp,  Lusk,  and  Bailey,  1998). 

•  The  Whitney  project  at  Numerical  Aerospace  Simulation  (NAS)  System  Division,  NASA  Ames 
Research  Center,  Moffett  Field,  California,  is  integrating  commodity  off-the-shelf  hardware  and 
software  technologies  to  incrementally  build  a  cluster  of  hundreds  to  thousands  of  nodes  to  sup¬ 
port  scientific  workload  (Jones,  1998). 

As  described  in  the  December  1997  issue  of  the  IEEE  Computer  magazine,  the  Gordon  Bell  Prize 
was  named  after  Gordon  Bell  who  was  a  former  National  Science  Foundation  division  director  and  is 
now  a  senior  researcher  at  Microsoft  Research.  In  his  earlier  career  at  Digital  Equipment  Corpora¬ 
tion,  Bell  had  been  involved  with  the  design  of  the  PDP  series.  Since  1987,  he  has  offered  annual 
prizes  to  spur  the  transition  of  parallel  processing  from  computer  science  research  to  useful  applica¬ 
tions.  Entries  to  prizes  are  coordinated  by  the  IEEE  Computer  Society. 

The  combined  team  consisting  of  California  Insitute  of  Technology,  Los  Alamos  National  Labo¬ 
ratories,  NASA  Goddard  Space  Flight  Center,  and  the  University  of  Louvain,  won  the  1997  Gordon 
Bell  Prize  for  price/performance  using  a  conunodity  cluster.  Indeed,  California  Institute  of  Technol¬ 
ogy  and  Los  Alamos  demonstrated  their  application  at  the  Supercomputing  Conference  SC’96  in 
Pittsburgh.  Their  combined  32-node  commodity  cluster,  each  a  16-node  200-MHz  Intel  Pentium  Pro 
cluster,  performed  a  10-million-particle,  tree-code  benchmark  and  achieved  a  2.19  Gflops  sustained 
performance.  The  cost  of  the  combined  system  was  $103,000  in  1996.  The  price/performance  was  21 
Gflops  per  million  dollars.  Unfortunately,  the  clusters  and  results  were  not  available  at  the  entry 
date,  that  is,  6  months  before  the  conference.  The  quoted  price/performance  was  three  times  better 
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than  that  of  the  price/performance  winning  team  at  the  Supercomputing  Conference  SC’96  (Warren 
etal.,  1997). 

The  Whitney  project  at  the  NAS  Systems  Division  is  equally  interesting.  It  is  named  after  Eli 
Whitney  who  is  believed  to  be  the  first  to  use  interchangeable  parts  in  manufacturing.  Although  the 
notion  of  an  affordable,  off-the-shelf,  commodity  cluster  was  experimented  with  earlier  at  NASA 
Goddard  Space  Flight  Center,  the  Whitney  project  is  pushing  it  to  a  larger  scale.  The  Whitney  cluster 
is  projected  to  have  up  to  500  processors  by  FY  99  and  5,000  processors  in  FY  2001  (Jones,  1998). 
The  preliminary  architectural  design  for  booting,  installing,  and  configuring  nodes  for  a  large  cluster 
was  conducted  (Fineburg,  1997).  Scalability  and  ease  of  maintenance  was  also  considered.  The  proj¬ 
ect  has  performed  benchmarking  and  performance  analysis  for  network  configuration  (2D  Torus, 
hub-based,  switch-based,  and  hub/switch  combination)  on  small  clusters.  The  last  configuration  will 
be  the  hub/switch  hybrid  that  considers  performance,  cost,  and  manageability  (Fineburg  and  Preditti, 
1997). 

The  Whimey  project  also  performed  a  price/performance  comparison  for  the  current  200-MHz 
Pentium  Pro  clusters  using  the  proprietary  Myrinet  versus  the  commodity  off-the-shelf,  Fast- 
Ethemet  networking.  The  comparison  was  based  on  NAS  Systems  Division  scientific  workload  ex¬ 
emplified  by  NAS  Systems  Division  application  benchmarks.  The  study  concluded  that  although 
Myrinet  had  superior  performance  characteristics  compared  to  Fast-Ethemet,  at  the  current  processor 
speed  (200-MHz),  it  did  not  provide  better  price/performance  until  the  cluster  became  extremely 
large  (Becker,  Nitzberg,  and  Wijngaart,  1997).  Note  that  price/performance  results  are  subject  to 
revision  as  price  changes.  In  fact,  any  drop  in  price  implies  better  price/performance. 

2.5  ADVANTAGES  AND  DISADVANTAGES 

With  the  advent  of  mass-market,  off-the-shelf  commodity,  high-performance  microprocessors  and 
networking,  a  low-cost,  scaleable,  fast  computer  can  be  assembled,  and  all  the  needed  software  can 
be  installed  in  less  than  a  week.  The  computer  can  achieve  Gflops  (billion  floating-point  operations 
per  second)  performance  for  $50,000.  Corrunodity-cluster  computing  has  recently  been  the  current 
theme  of  high-performance  computing  in  the  environment  of  budget  constraints  and  dissatisfaction 
with  proprietary  and  expensive  systems. 

The  many  advantages  of  a  commodity  cluster  can  be  summarized  as  follows: 

•  Affordability.  This  goal  is  achieved  because  all  the  hardware  components  are  mass-market 
off-the-shelf.  Software  (commodity  off-the-shelf  operating  system,  utilities,  compilers,  mes¬ 
sage-passing  libraries  and  other  applications)  is  also  free  or  at  a  nominal  fee. 

•  Rexibilitv.  A  commodity  cluster  is  highly  flexible  for  upgrading  and  reconfiguration  because 
of  its  modular  construction. 

•  Ease  of  setting  up  testbeds.  Again,  this  goal  is  achieved  because  of  modular  construction  us¬ 
ing  standard  off-the-shelf  components. 

While  the  affordability  of  commodity  clusters  is  quite  attractive  in  a  budget-constrained  environ¬ 
ment,  the  flexibility  for  upgrading  and  reconfiguration  allows  quick  response  as  the  market  evolves, 
and,  thus,  assures  the  investment’s  longevity.  Evidently,  in  the  long-run,  a  commodity  cluster  will  be 
a  heterogeneous  system  having  a  mix  of  several  generations,  each  of  a  different  type  of  performance 
and  characteristics. 


9 


The  major  disadvantage  of  commodity  clusters  is  believed  to  be  the  lack  of  a  support  facility  for 
help  when  a  problem  arises  (especially  with  the  use  of  Linux  as  the  operating  system  and  the  result¬ 
ing  software  environment).  This  drawback,  however,  has  been  reduced  by  many  online  resources 
including  Linux  websites  and  newsgroups.  In  particular,  the  Linux  Documentation  Project  is  devel¬ 
oping  reliable  documentation  for  installing,  using,  and  running  Linux. 

Currently,  the  main  disadvantage  of  commodity  clusters  is  caused  by  the  small  market  share  of 
Linux.  The  freely  available  Linux  operating  system  together  with  off-the-shelf,  mass-market  PCs 
and  networking  components  has  nurtured  commodity  clusters.  Currently,  the  small  market  share  of 
Linux  platforms  prevents  commodity-computing  clusters  from  playing  a  bigger  role  in  parallel/ 
distributed  computing. 

The  lack  of  vendor-supplied  hardware  drivers  shows  one  disadvantage  of  the  small  market  share 
of  Linux  platforms.  All  Linux  sytem  hardware  device  drivers  are  essentially  written  by  Linux  users. 
As  a  case  in  point,  most  vendors  of  Fast-Ethemet  adapters  supply  corresponding  drivers  supporting 
Windows  NT,  Windows  95,  Windows  for  Workgroup,  Novell  Netware,  and  several  Unix  flavors,  but 
none  support  Linux.  Most  Linux  Ethernet  drivers  were  written  by  Donald  Becker,  a  member  of  the 
team  that  won  the  1997  Gordon  Bell  Prize  for  price/performance.  The  name,  Becker  Series  Drivers, 
has  been  coined  by  the  Linux  community.  Since  the  hardware  components  evolve  rapidly,  the 
smallmarket  share  of  Linux  platforms  poses  a  serious  drawback.  Steve  Elbert,  at  the  Pentium  Pro 
Cluster  Computing  Workshop,  10-1 1  April  1997,  Des  Moines,  Iowa,  related  the  following  experi¬ 
ence:  a  Fast-Ethemet  card  ordered  a  month  later  had  different  components  and  required  a  different 
driver  update  (Elbert,  1997).  This  market-driven  disadvantage  requires  great  capability  on  the  part  of 
users. 

Additionally,  the  use  of  Linux  as  the  operating  system  prevents  new  technology  from  spreading  to 
commodity  clusters,  and,  thus,  holds  back  their  capability  and  limits  their  usefulness.  The  situation  is 
expected  to  improve  quickly  with  Intel's  recent  decision  to  invest  in  Red  Hat  Software  Inc.  and  to 
create  a  deeper  involvement  with  the  Linux  community.  However,  the  pervasiveness  of  Windows 
NT  platforms  offers  a  compelling  reason  for  using  Windows  NT-based  commodity  clusters  along¬ 
side  Linux-based  commodity  clusters.  This  prescription  is  based  on  the  reality  that  economics  favors 
high-volume  markets.  As  hardware  becomes  cheaper,  software  development  becomes  expensive  and 
must  rely  on  modular,  plugged-in  components  that  are  only  available  on  platforms  that  constitute 
reasonable  market  shares. 

2.6  NETWORK-INTERFACE  CHALLENGE  AND  INDUSTRY-STANDARDS  EFFORT 

Typically,  a  cluster  interconnect  network  is  a  small-scale  local-area  network  (LAN),  often  within 
an  enclosed  space,  that  connects  cluster  elements  at  high  speeds.  In  current  terminology,  this  is  a 
system  area  network  (SAN).  Traditional  LAN  networking  protocols  were  first  developed  for  long- 
haul  communications.  End-to-end  protocol  overheads  are  substantially  high,  often  several  times  the 
transport  latency.  So,  the  increase  in  bandwidth  alone  does  not  benefit  applications  that  send  and 
receive  small  data  packets.  Such  applications  include  database  operations  and  fine-grained  parallel 
programs.  Additionally,  multiple  data  copies  are  required  as  messages  pass  from  one  cluster  element 
to  another.  The  corresponding  overheads  for  copies  and  flushes  are  the  dominant  factor  in  communi¬ 
cation  performance  (Keeton,  Anderson,  and  Patterson,  1995). 

Proprietary  SAN  interconnects  have  been  provided  by  vendors.  Two  examples  are  Myrinet  from 
Myricom  and  ServerNet  from  Tandem  (now  Compaq).  Proprietary  interconnects  are  expensive;  such 
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networks  can  easily  be  more  than  half  the  total  cost  of  the  entire  cluster  (Sterling  et  al.,  1998).  How¬ 
ever,  academic  and  research  organizations  have  been  actively  working  on  prototypes  for  SAN  inter¬ 
faces.  Notably,  U-Net  (Cornell),  Active  Messages  (UC  Berkeley),  Fast  Messages  (UIUC),  and 
Virtual  Memory-Mapped  Communication  (Princeton).  These  prototypes  are  efforts  toward  low- 
latency  networking  for  SANs.  The  common  features  of  these  research-oriented  network  interfaces 
are  (1)  user-level  protected  network  access,  (2)  hardware  support  for  gather-scatter,  and  (3)  early 
demultiplexing  of  incoming  traffic.  User-level  network  access  reduces  overheads  by  allowing  data 
transfer  without  kernel  involvement.  For  security  reasons,  this  direct  access  must  be  in  a  protected 
mode.  Hardware  support  for  gather-scatter  and  early  demultiplexing  help  eliminate  message  copies 
(Chien,  1998a).  The  benefits  of  these  efforts,  however,  have  not  been  transferred  to  commercial 
products. 

Realizing  that  competing  technologies  tend  to  confuse  rather  than  promote  commodity-cluster 
computing,  Intel,  in  collaboration  with  Compaq  and  Microsoft,  initiated  a  standardization  effort.  In 
December  1997,  Intel,  Compaq,  and  Microsoft,  with  contributions  from  over  100  industry  and 
research  organizations,  released  version  1.0  of  the  Virtual  Interface  Architecture  (VIA)  specification. 
The  three  main  features  for  high-performance  network  interfaces  formed  the  backbone  of  the  speci¬ 
fication.  User-level  protected  access  allows  the  application  to  bypass  the  operating  system  and  use 
its  virtual  memory  to  communicate  directly  to  the  network-interface  cards  (NICs).  Standard  inter¬ 
faces  for  gather-scatter  and  early  demultiplexing  further  promote  zero-copy  operations  by  eliminat¬ 
ing  buffer  copying  and  kernel  overhead.  In  short,  with  the  VIA,  control  and  setup  go  through  the 
kernel,  however,  data  go  directly  from  (or  to)  the  application  to  (or  from)  the  NIC  (Compaq,  Intel, 
Microsoft,  1997;  Compaq,  1998;  Intel,  1997;  Chien,  1998a). 

VIA  is  intended  to  be  operating-system  independent  and  processor-independent.  However,  it  stops 
short  of  providing  the  concrete  API  required  for  software  portability.  This  shortcoming  is  expected 
to  be  overcome  as  the  standardization  process  progresses. 

Taking  the  role  of  promoters,  the  three  vendors  (Intel,  Compaq,  and  Microsoft)  are  spearheading 
development  of  various  VLA-enabled  hardware  and  software  products.  It  is  expected  that  the  imme¬ 
diate  target  of  VIA  is  the  emerging  Gigabit  Ethernet  technology. 

Overall,  the  VIA  cluster  network  standard  is  an  attempt  to  formulate  a  standardized,  cost-effective, 
high-performance  cluster-interconnect  solution.  The  widespread  adoption  of  VIA,  when  realized, 
will  have  a  significant  impact  on  the  role  of  commodity-cluster  computing  in  high-performance 
parallel  and  distributed  computing  in  the  near-term.  In  particular,  VIA  brings  volume  to  high- 
performance  cluster  interconnects,  thus  lowering  cost  and  promoting  commodity  off-the-shelf  cluster 
computing.  The  coupling  of  VIA  and  the  emerging  Gigabit  Ethernet  technology  will  be  the  dominant 
driving  force  in  SANs.  This,  in  turn,  will  provide  the  path  to  low-cost,  high-performance,  scalable, 
commodity-cluster  computing. 

2.7  COMMODITY  CLUSTER  PROJECTS 

While  the  majority  of  commodity  clusters  is  Linux-based,  there  is  an  emergence  of  Windows  NT 
clusters.  Selected  clusters  of  both  groups  are  listed  below. 

2.7.1  Linux-based  Commodity  Clusters 

The  following  list  is  incomplete;  we  list  only  four  commodity  clusters  that  have  been  referred  to 
earlier  in  the  section.  The  cluster,  Aeneas,  at  the  Department  of  Physics,  Univeristy  of  California, 
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Irvine,  is  an  example  of  a  commodity  cluster  built  on  the  300-MHz  Pentium  II  processors.  Naegling, 
at  the  California  Insitute  of  Technology,  is  the  outgrowth  of  the  original  cluster  (Hyglac),  that  to¬ 
gether  with  Loki,  of  Los  Alamos  National  Laboratories,  won  the  1997  Gordon  Bell  Prize  for 
price/performance.  Finally,  Whitney,  of  the  NAS  Systems  Division,  NASA  Ames  Research  Center, 
will  eventually  reach  5000  processors.  Pointers  to  other  Linux-based  commodity  clusters  (and  re¬ 
sources)  can  be  found  on  home  pages  of  the  four  mentioned  clusters. 

•  Aeneas.  Department  of  Physics,  University  of  California,  Irvine 
http  ://aeneas.ps.uci  .edu/ aeneas/ 

•  This  is  a  65-node  commodity  cluster  (1  front-end  and  64  compute  nodes),  each  node  having  a 
300-MHz  Pentium  II  processor,  128-MB  RAM,  a  3.1-GB  HIDE  disk,  and  one  Fast-Ethernet 
card.  The  networking  configuration  is  switch-based,  using  two  36-port  Fast-Ethernet  switches. 

•  Loki,  Los  Alamos  National  Laboratories 
http;//loki-www.lanl.gov/ 

•  Loki  is  a  16-node  commodity  cluster,  each  node  having  a  200-MHz  Pentium  Pro  processor, 
128-MB  RAM,  a  3.2-GB  EIDE  disk,  and  a  Fast-Ethernet  card.  The  network  has  a  hypercube 
configuration. 

•  Naegling.  Center  for  Advanced  Computing  Research  (CACR),  California  Institute  of  Technology 
http://www.cacr.caltech.edu/research/beowulf/ 

This  commodity  cluster  is  currently  composed  of  1 14  nodes  and  two  front-end  systems,  each 
having  a  200-MHz  Pentium  Pro  processor,  128-MB  RAM,  a  3.1-GB  EIDE  disk,  and  a  Fast-Ethernet 
card.  Each  front-end  system  also  has  128-MB  extra  RAM  and  an  8-GB  extra  disk.  The  network  con¬ 
figuration  is  switch-based. 

•  Whitney.  Numerical  Aerospace  Simulation  (NAS)  Systems  Division,  NASA  Ames  Research 
Center  http://parallel.nas.nasa.gov/Parallel/Projects/Whitney/ 

According  to  an  NAS  Systems  Division  report,  NAS-97-024  (Fineburg,  1997),  as  of  October 
1997,  Whitney  had  36  compute  nodes,  3  I/O  nodes,  and  1  front-end.  As  of  April  1998,  two  extra 
front-end  systems  were  added.  Each  compute  node  has  a  200-MHz  Pentium  Pro  processor,  128-MB 
RAM,  a  2.5-GB  EIDE  disk,  and  a  Fast-Ethernet  adapter.  The  I/O  nodes  are  similar  to  compute 
nodes,  except  each  has  two  200-MHz  Pentium  Pro  processors,  256-MB  RAM,  and  two  9-GB  SCCI 
disks.  One  front-end  is  a  uniprocessor  system  with  512-MB  RAM,  a  4-GB  SCCI  disk,  and  a  second 
Fast-Ethernet  card  for  connecting  to  a  general  NAS  network  and  the  Internet.  Two  added  front-ends 
are  333-MHz  Pentium  II  systems  with  128-MB  and  512-MB  RAM,  respectively.  The  final  network 
configuration  is  expected  to  be  a  switch/hub  hybrid. 

2.7.2  Windows  NT-based  Commodity  Clusters 

There  is  an  emerging  trend  of  commodity  clusters  shifting  from  Linux  to  Windows  NT.  Two  im¬ 
portant  projects  are: 

•  High  Performance  Virtual  Machines  ('HPVMl.  University  of  Illinois,  Urbana-Champaign. 
http://www-csag.cs.uiuc.edu/projects/hpvm.html 

The  HPVM  III  cluster  consists  of  128  dual-processor  machines  using  300-MHz  Pentium  II  proces¬ 
sors.  It  has  in  aggregate  48  GB  of  DRAM  and  400  GB  of  disk  space.  It  uses  the  Myrinet  network 
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interface.  The  core  communication  layer  is  Fast  Messages  (FM),  a  low-level  layer  designed  to  enable 
convenient  layering  of  other  APIs  and  protocols  on  top  of  it  (Chien,  1998b;  Lauria,  Pakin,  and 
Chien,  1998).  In  particular,  the  MPI-FM  is  a  high-performance  implementation  of  the  Message 
Passing  Interface  (MPI)  based  on  the  port  of  MPICH  (the  implementation  of  MPI  by  Argonne  Na¬ 
tional  Laboratory  and  University  of  Mississippi)  to  FM  (Lauria  and  Chien,  1997). 

•  Virtual-Memory-Mapped  Communication  (VMMC)  II/NT,  Princeton 
http://www.cs.princeton.edu/shrimp/ntvmmc/ 

VMMC-II  for  Windows  NT  is  a  part  of  the  Scalable  High-Performance  Really  Inexpensive  Multi- 
Processor  (SHRIMP)  project  at  Princeton  University  to  investigate  high-performance  servers  built 
from  networks  of  commodity  PCs  and  commodity  operating  systems.  Currently,  the  VMMC-H/NT 
cluster  has  eight  uniprocessor  PCs,  each  with  a  3(X)-MHz  Pentium  n  processor  and  128-MB  RAM. 
The  VMMC  model  allows  processes  to  transfer  data  directly  between  virtual  memory  buffers. 
VMMC-n/NT  is  an  implementation  of  the  VMMC  model  on  the  Myrinet  network  interface  (Dub- 
nicki,  Cezary,  Bilas,  Chen,  Damianakis,  and  Li,  1997;  Dubnicki,  Bilas,  Li,  and  Philbin,  1997). 
Twenty-four  more  PCs  will  be  added  to  the  current  cluster. 

2.8  CONCLUSIONS 

Microprocessor  and  network  technology  commodity  clusters  assembled  from  mass-market,  off- 
the-shelf  components  can  now  deliver  scalable,  high  and  sustained  performance  at  an  affordable 
cost.  The  Linux  operating  system,  instrumental  in  providing  the  first  software  environment  for  most 
commodity  clusters,  has  reached  the  point  where,  because  of  the  market  force,  it  tends  to  hold  back 
the  capability  and  limit  the  usefulness  of  commodity  clusters  in  parallel  and  distributed  computing. 
The  situation  is  expected  to  improve  significantly  because  of  Intel's  recent  decision  to  invest  in  Red 
Hat  Software  Inc.  and  to  create  a  deeper  technical  involvement  with  the  Linux  community.  How¬ 
ever,  the  pervasiveness  of  Windows-NT  platforms  offers  a  compelling  reason  for  using  NT-based 
commodity  clusters  alongside  Linux-based  commodity  clusters.  In  the  near-term,  the  widespread 
adoption  of  the  industry-standard  VIA,  when  realized,  will  provide  a  standardized,  cost-effective, 
high-performance,  cluster-interconnect  solution.  The  coupling  of  VIA  and  the  emerging  Gigabit 
Ethernet  is  expected  to  be  the  dominant  driving  force  in  cluster  networking.  This,  in  turn,  will  pro¬ 
vide  the  path  to  low-cost,  high-performance  scalable  clusters. 

2.9  BIBLIOGRAPHY 

1 .  The  Beowulf  Project  at  Center  of  Excellence  in  Space  Data  and  Information  Sciences  (CESDIS), 
Goddard  Space  Flight  Center  (GSFC) 

http://cesdis.gsfc.nasa.gov/linix/beowulf/beowulf.html 

The  Beowulf  Project  was  started  at  CESDIS  in  Summer  1994  with  a  16-node  (50-MHz  Intel  DX2) 
cluster.  This  project  helped  popularize  mass-market,  commodity,  off-the-shelf  computing  clusters. 
Over  time,  the  Beowulf  Project  at  CESDIS  contributed  greatly  to  the  advent  of  commodity-cluster 
computing.  In  particular,  its  contribution  to  Linux  Ethernet  device  drivers  is  critical  for  the  growth  of 
(Linux-based)  commodity  clusters.  The  project’s  home  page  serves  as  a  clearinghouse  for  informa¬ 
tion  on  Linux-based  clusters. 

2.  The  Berkeley  Network  of  Workstations  (NOW)  Project 

http://now/CS  .Berkeley.EDU/ 
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The  Berkeley  NOW  Project  has  been  an  innovative  frontrunner  in  workstation-cluster  computing. 
The  project  has  assembled  large  (100-plus)  Solaris-based  UltraSPARC  systems.  It  has  also  designed 
the  fast  communication  interface  Active  Messages,  now  in  version  2.0. 

3.  The  Virtual  Interface  Architecture  (VIA) 

http://www.viarch.org 

The  VIA  specification  (version  1.0,  December  1997)  can  be  downloaded  after  registration.  Intel, 
the  principal  promoter  of  VIA,  maintains  a  web  page  full  of  information  for  general  viewers  and  for 
developers.  The  URL  is  http://www.intel.com/design/servers/vi/index.htm 

Also  of  interest  is  the  Berkeley  VIA  Project  (http://www.cs.berkeley.edu/~philipb/via/)  that  is  im¬ 
plementing  the  VIA  for  use  in  cluster  networking  for  another  project.  The  Berkeley  VIA  Project 
aims  to  develop  high-quality  VIA  implementations  for  multiplatforms  and  to  investigate  possible 
improvements. 
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3.  WORKLOAD-MANAGEMENT  SOFTWARE 


3.1  INTRODUCTION 

The  proliferation  of  clusters  of  commodity  PCs,  workstations,  parallel  systems,  and  even  clusters 
of  clusters  under  the  notion  of  distributed  metacomputing  has  brought  workload-management  soft¬ 
ware  (WMS)  into  focus.  The  distributed  metacomputing  concept  encompasses  a  collection  of  plat¬ 
forms  that  virtually  operate  as  a  single,  shared-computing  resource.  WMS  is  expected  to  perform 
network-wide  workload  management  by  dynamically  scheduling  resources  and  transforming  a  dis¬ 
tributed  network  of  heterogeneous  platforms  into  a  metacomputer. 

The  purpose  of  this  section  is  to  answer  two  questions: 

•  What  are  the  leading  WMS  packages? 

•  What  features  are  desirable  when  choosing  a  WMS  package  for  a  generic  distributed  comput¬ 
ing  environment? 

A  series  of  evaluations  has  recently  answered  the  first  question.  Authors  have  assessed  packages 
using  documents  (such  as  manuals  and  user’s  guides)  available  on  the  Internet  with  feedback  from 
vendors/developers  (in  some  cases).  The  evaluation  results  give  snapshots  of  the  state  of  WMS, 
though  they  may  no  longer  be  valid  for  the  current  releases.  Moreover,  these  evaluations  are  essen¬ 
tially  paper  exercises  without  physically  setting  up,  installing,  and  testing  packages  on  different  plat¬ 
forms.  In  addition,  even  if  testing  is  performed,  many  issues  are  rather  subjective  and  difficult  to 
quantify  (Baker,  Fox,  and  Yau,  1995).  For  these  reasons,  it  is  imperative  to  distill  a  list  of  features 
that  are  most  desirable  for  a  generic  distributed  computing  environment.  This  list,  in  turn,  provides  a 
general  framework  when  selecting  WMS.  The  references  at  the  end  of  section  3  provide  details  for 
specific  requirements. 

3.2  REVIEW  OF  RECENT  EVALUATIONS 

In  this  subsection,  we  present  recent  evaluation  results  of  WMS  packages.  These  results,  from  dif¬ 
ferent  assessments,  appeared  in: 

•  A  report  for  the  British  Joint  Information  Systems  Committee  (JISC)  New  Technology  Sub 
Committee  (NTSC)  in  November  1995.  This  report  is  also  available  as  an  online  article  in  the 
DARPA-funded  National  High-Performance  Software  Exchange  (NHSE)  Review  (1996  Vol¬ 
ume,  First  Issue).  It  also  appeared  as  a  technical  report  of  the  Northeast  Parallel  Architectures 
Center  (NPAC)  at  Syracuse  University  (Baker,  Fox,  and  Yau,  1995). 

•  A  White  Paper  of  the  Distributed  Object  Computational  Testbed  (DOCT)  at  the  San  Diego 
Supercomputer  Center  (SDSC).  This  document  was  an  evaluation  of  WMS  for  use  in  DOCT 
(San  Diego  Supercomputer  Center,  1997). 

•  A  series  of  reports  from  the  Numerical  Aerospace  Simulation  (NAS)  Systems  Division,  NASA 
Ames  Research  Center,  on  WMS  in  conjunction  with  Affordable  High-Performance  Comput¬ 
ing,  a  NASA  cooperative  project  (Jones,  1996a;  Jones,  1996b;  Jones  and  Brickell,  1997). 
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The  NPAC  report  (Baker,  Fox,  and  Yau,  1995)  provided  an  extensive  update  of  an  earlier  WMS 
evaluation  conducted  by  NASA  Langley  Research  Center  (LaRC)  (Kaplan  and  Nelson,  1994).  Es¬ 
sentially  using  the  LaRC’s  set  of  criteria,  the  NPAC  report  produced  a  short  list  of  four  commercial 
WMS  packages  (out  of  seven)  and  two  research  packages  (out  of  12)  that  met  the  evaluation  stan¬ 
dards  in  the  1995  period.  The  four  leading  commercial  WMS  packages,  in  alphabetical  order,  were; 

•  CODINE  (computing  in  Distributed  NEtwork) 
http;//www.genias.de/products/codine 

•  CONNECTrqueue 
http://www.sterling.com 

•  LSF  (Load  Sharing  Facility) 
http://www.platform.com 

•  NQE  (Network  Queuing  Environment) 
http://www.cray.com/products/software/nqe/ 

The  two  research-oriented  WMS  packages  on  the  short  list,  in  alphabetical  order,  were: 

•  DQS  (Distributed  Queueing  System) 
http://www.scri.fsu.edu/~pasko/dqs.html 

•  PBS  (Portable  Batch  System) 
http://science.nas.nasa.gov/Software/PBS/ 

Historically,  the  Network  Queuing  System  (NQS),  developed  at  NAS  Systems  Division,  is  consid¬ 
ered  the  progenitor  of  all  WMS  packages  (Kaplan  and  Nelson,  1994).  Three  direct  descendants  of 
NQS  are  CONNECT :queue,  NQE,  and  PBS.  CONNECT:queue,  a  commercial  product  that  origi¬ 
nated  from  the  software  developed  for  the  NAS  Systems  Division  by  Sterling  Software,  is  no  longer 
available.  Network  Queuing  Environment  (NQE)  was  originally  developed  to  support  the  Cray  Re¬ 
search  Inc.  (CRI)  product  line.  Since  the  merging  of  CRI  into  Silicon  Graphics,  NQE  has  been 
ported  to  SGI  platforms.  Portable  Batch  System  (PBS)  has  been  developed  and  deployed  at  the  NAS 
Systems  Division  for  use  on  parallel  and  distributed  systems. 

The  research-oriented  Distributed  Queueing  System  (DQS),  developed  at  Florida  State  University, 
has  a  commercial  version  labeled  CODINE  (Computing  in  Distributed  NEtwork)  that  is  marketed 
by  GENIAS  Software  GmbH  (Germany),  mostly  in  Europe.  Load  Sharing  Facility  (LSF),  a  product 
of  Platform  Computing,  has  its  root  in  the  research  software,  Utopia,  developed  at  the  Univeristy  of 
California,  Berkeley,  and  the  University  of  Toronto,  Canada. 

Currently,  LSF  and  CODINE  are  the  two  most  widely  used  WMS  packages,  LSF  in  North  Amer¬ 
ica  and  CODINE  in  Europe.  Platform  Computing  has  successfully  allied  itself  with  major  system 
vendors  such  as  Sun,  HP,  Compaq,  Silicon  Graphics,  Digital,  Fujitsu,  Hitachi,  NEC,  and  Sony,  who 
bundle,  distribute,  and  co-market  LSF  to  their  customers. 

A  more  recent  evaluation  of  WMS  was  conducted  for  use  in  the  D  ARPA-funded  Distributed  Ob¬ 
ject  Computation  Testbed  (DOCT)  at  the  San  Diego  Supercomputer  Center.  As  noted  in  the  online 
white  paper  dated  January  1997,  the  evaluation  concentrated  on  commercial  packages,  realizing  that 
a  commercial  off-the-shelf  product  was  more  suitable  for  the  project.  The  four  leading  WMS  pack¬ 
ages  that  appeared  in  the  DOCT’s  short  list,  in  alphabetical  order,  were: 
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•  CODINE  (Computing  in  Distributed  NEtwork) 
http://www.genias.de/products/codine 

•  LoadLeveler 

http://www.rs6000.ibm.com/software/sp_products/loadlev.html 

•  LSF  (Load  Sharing  Facility) 
http://www.platform.com 

•  NQE  (Network  Queuing  Environment) 
http://www.cray.com/products/soflware/nqe/ 

This  list  is  quite  consistent  with  the  commercial  short  list  produced  by  NPAC,  noting  that 
CONNECT:queue  is  no  longer  available.  LoadLeveler  was  not  in  the  NPAC  commercial  short  list 
because  it  only  supported  IBM  SP-2  in  1995  when  the  NPAC  evaluation  was  conducted.  Since  then, 
LoadLeveler  has  also  supported  heterogeneous  platforms. 

In  the  final  analysis,  the  DOCT  project  chose  NQE  because  (1)  it  allowed  the  insertion  of  an  ex¬ 
ternal  scheduler,  and  (2)  project  members  were  most  familiar  with  it.  At  the  time  of  evaluation, 

NQE,  LoadLeveler,  and  to  some  extent,  CODINE,  were  commercial  packages  with  interfaces  for  an 
external  scheduler.  Starting  with  version  3.1  (released  in  December  1997),  LSF  also  provided  this 
facility. 

The  most  recent  evaluation  of  WMS  packages  was  performed  with  the  NASA  Affordable  High 
Performance  Computing  (AHPC)  project  (Jones  and  Brickell,  1997).  This  evaluation  is  the  second 
attempt  of  Phase  1  of  a  multiphase  effort.  Phase  1  was  tasked  to  perform  a  pencil-and-paper  com¬ 
parison  of  WMS  packages  against  stated  capabilities  based  on  NAS  System  Division  requirements 
and  vendor-supplied  document.  In  Phase  2,  the  NAS  System  Division  staff  and  selected  users  for 
each  package  will  test  in  a  less  rigorous  environment,  meeting  the  minimum  requirements  used  in 
Phase  1.  Finally,  in  the  optional  Phase  3,  testing  will  be  done  in  a  production  environment  with  nor¬ 
mal  user  workload  for  a  2-month  evaluation  (Jones,  1996b). 

The  original  Phase  1  evaluation  was  performed  in  early  1996  at  the  NAS  Systems  Division.  The 
following  four  leading  WMS  packages  were  ranked,  highest  to  lowest:  PBS,  LSF,  LoadLeveler,  and 
NQE.  However,  because  of  incapability  across  the  market,  the  Phase  2  evaluation  was  postponed  and 
the  Phase  1  evaluation  was  repeated  in  early  1997  (Jones,  1996b).  The  second  Phase  1  evaluation 
gave  the  following  list,  from  highest  to  lowest  ranking:  PBS,  LSF,  CODINE,  LoadLeveler,  DQS, 
and  NQE.  Again,  Phase  2  evaluation  was  postponed  because  of  incapability  across  WMS.  Another 
Phase  1  evaluation  is  planned  for  a  later  date  (Jones  and  Brickell,  1997). 

The  evaluation  results  of  the  leading  WMS  packages  can  be  sununarized  in  two  short  lists  in  al¬ 
phabetical  order: 

Commercial:  CODINE,  LoadLeveler,  LSF,  and  NQE. 

Research:  DQS  and  PBS. 

In  general,  LoadLeveler,  LSF,  NQE,  and  PBS  provide  an  exceptional  service  by  making  up-to- 
date,  comprehensive  documentation  (including  manuals,  user’s  guides,  administrator’s  guides,  and 
white  papers)  available  for  online  viewing  and  ready  for  download.  This  service  is  critical  for  a  pen¬ 
cil-and-paper  assessment. 
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As  an  update,  we  note  that  most  packages  have  had  upgraded  versions  with  new  features  and  im¬ 
provements  while  others  have  been  unchanged  since  mid- 1997  when  the  NAS  System  Division  re¬ 
port  of  the  second  Phase  1  evaluation  was  published.  In  particular,  LoadLeveler  has  not  changed;  its 
current  version  is  1.3,  released  in  August  1996.  CODINE  4.2  is  expected  to  be  released  in  October 
1998.  Version  1.1.12  of  PBS  has  been  available  since  July  1998.  NQE  3.3  was  released  in  March 
1998.  LSF  has  been  upgraded  twice  since  the  NAS  Systems  Division  second  Phase  1  evaluation;  its 
current  version  is  3.2,  released  in  August  1998.  LSF  has  been  the  de  facto  leader  in  WMS.  LSF  is  the 
WMS  used  to  manage  the  SGI/Cray  ASCI  Blue  Mountain  at  Los  Alamos  National  Laboratory.  Blue 
Mountain  is  a  SGI/Cray  0rigin2000,  a  scalable  shared-memory  multiprocessor  with  1024  proces¬ 
sors.  As  noted  in  section  2,  LSF  is  used  for  workload  management  in  the  128-node  (dual-processor) 
Pentium  II  cluster  HPVM  III  at  the  University  of  Illinois,  Urbana-Champaign. 

There  is  an  article  on  WMS  from  the  user’s  perspective  in  the  April-June  (1998)  issue  of  IEEE 
Computational  Science  &  Engineering  (Papakhian,  1998).  Although  the  title  indicates  a  comparison, 
the  article  essentially  describes  functionality  and  features  of  WMS  packages.  Overall,  comparing 
WMS  packages  is  a  difficult  exercise.  It  is  subjective  and  many  issues  (including  ease  of  use,  con¬ 
figurability,  maintainability,  and  user  support)  are  hard  to  quantify  (Baker,  Fox,  and  Yau,  1995;  Pa¬ 
pakhian,  1998).  For  this  reason,  in  the  next  section,  a  list  of  features  most  desirable  for  a  generic 
distributed  computing  environment  is  presented  to  serve  as  a  general  guide  for  selecting  WMS.  The 
references  at  the  end  of  this  section  provide  details  for  specific  requirements. 

3.3  DESIRED  FUNCTIONALITY 

A  typical  WMS  package  allows  enforcement  of  resource  allocation  limits  using  common  parame¬ 
ters  including 

•  number  of  CPUs  per  nodes 

•  number  of  nodes  per  job 

•  type  of  nodes  per  job 

•  number  of  jobs  executing  per  user 

•  number  of  jobs  executing  per  group 

•  wall  clock  time 

•  CPU  time  (per  node  and  per  application) 

•  system  time 

•  memory  utilization 

•  disk  usage 

•  swap  space 

Essentially,  WMS  maps  a  job’s  resource  requirement  onto  the  available  resource(s)  in  the  meta¬ 
computing  environment. 
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In  addition  to  this  basic  functionality,  many  other  features  should  be  considered  when  choosing  a 
WMS  package.  The  10  most-desired  features  are  listed  and  discussed  here. 

a.  Operate  in  a  heterogeneous  environment 

A  homogeneous  computing  environment  consists  of  many  platforms  with  the  same  architecture 
running  the  same  operating  system,  while  a  heterogeneous  environment  is  characterized  by  plat¬ 
forms  of  dissimilar  architectures  and  different  operating  systems.  Every  environment  will  eventually 
become  heterogeneous.  Even  an  environment  that  starts  from  a  homogeneous  system  will  gradually 
become  a  mixture  of  several  generations,  each  with  different  performance  characteristics  because  of 
expansion,  upgrading,  and  reconfiguration.  Heterogeneity  also  occurs  normally  in  wide-area  distrib¬ 
uted  computing  where  the  notion  of  the  metacomputer  encompasses  computing  resources  at  several 
sites,  possibly  geographically  dispersed  and  under  different  administrative  domains.  Consequently,  a 
WMS  package  that  supports  only  a  homogeneous  environment  is  limited  in  distributed  computing. 

b.  No  single  point  of  failure  (SPF) 

A  single  point  of  failure  (SPF)  occurs  when  the  WMS  package  only  runs  with  one  designated 
master  host.  In  such  a  case,  when  the  master  host  fails,  the  whole  system  fails.  For  fault  tolerance, 
some  WMS  packages  allow  the  designation  of  an  alternate  master  host.  Preferably,  every  host  other 
than  the  master  host  should  serve  as  an  alternate  host  so  that  when  the  master  host  fails,  another  host 
takes  over,  and  the  load-management  services  are  available  as  long  as  any  host  is  operational. 

c.  Provide  security  protection  and  support  distributed  file  systems 

Access  control  to  remote  execution  must  be  provided.  The  two  most  common  authentication  ap¬ 
proaches  are:  (1)  weak  authentication  represented  by  an  identification  protocol  such  as  RFC-931  (St. 
Johns,  1985)  or  RFC-1413  (St.  Johns,  1993);  and  (2)  strong  authentication  such  as  the  Kerberos  net¬ 
work  authentication  service  (Neuman  and  Tso,  1994).  RFC-931  and  the  more-recent  RFC-1413  are 
connection-based  applications  on  TCP  that  use  an  identification  daemon  running  on  each  client  host. 
This  form  of  authentication  has  security  limits.  It  is  known,  as  noted  in  RFC-1413,  that  the  informa¬ 
tion  returned  by  these  identification  protocols  is,  at  most,  as  trustworthy  as  the  host  providing  it,  and 
that  if  the  host  has  been  compromised,  then  the  information  obtained  may  be  misleading  or  incorrect. 
Consequently,  authentication  using  identification  protocols,  also  called  authentication  by  assertion, 
is  used  only  to  provide  more  auditing  information  for  TCP  connections.  Stronger  authentication 
methods  based  on  cryptography  are  required.  One  such  method  is  the  Kerberos  network  authentica¬ 
tion  service,  a  secret-key  auAentication  system  developed  at  MIT  under  Project  Athena.  Version  5 
(V5)  of  Kerberos  is  represented  in  RFC-1510  (Kohl  and  Neuman,  1993).  Kerberos  offers  three  dif¬ 
ferent  levels  of  protection:  (1)  authentication  at  the  beginning  of  a  network  connection,  (2)  authenti¬ 
cation  for  each  message  sent  from  one  host  to  another,  and  (3)  both  authentication  and  encryption  for 
each  message  sent  from  one  host  to  another  host.  More-recent  implementations  of  Kerberos  support 
the  Generic  Security  Service  Application  Program  Interface  (GSSAPI)  described  in  RFC-1508. 

Kerberos,  however,  has  two  fundamental  limitations.  First,  Kerberos  is  not  effective  against  pass¬ 
word  guessing  attacks.  That  is,  if  a  password  is  poorly  chosen,  an  attacker  may  be  able  to  guess  and 
impersonate  the  user.  Second,  Kerberos  requires  a  trusted  path  for  password  entry.  If  an  attacker  can 
monitor  the  path  between  the  user  and  the  initial  program,  then  the  user  can  be  impersonated.  There¬ 
fore,  other  tools,  such  as  one-time  passcodes  and  public-key  cryptography,  should  be  combined  with 
Kerberos  to  improve  security  (Neuman  and  Tso,  1994).  Although  Kerberos  is  based  on  conventional 
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symmetric-key  cryptography  (Data  Encryption  Standard  DES),  recent  extensions  allow  integration 
with  public-key  cryptography  systems. 

Most  operating  systems  support  distributed  file  systems  such  as  the  Network  File  System  (NSF) 
and/or  the  Andrew  File  System  (AFS),  and  some  even  support  the  Open  Software  Foundation  (OSF) 
Distributed  File  System  (DSF).  These  distributed  file  systems  allow  consistent  data  access  on  all 
machines  across  the  network.  Furthermore,  DFS  also  provides  security  under  the  OSF  Distributed 
Computing  Environment  (DCE)  via  credential  transfer  or  other  authentication  services. 

d.  Provide  Scheduling  API  for  external  scheduler 

For  any  WMS  package,  no  matter  how  many  scheduling  policies  exist,  or  how  flexible  they  are, 
eventually,  an  external  scheduler  must  be  inserted.  The  two  most  common  reasons  for  having  an 
external  scheduler  are: 

•  To  enforce  a  specific  scheduling  policy  that  is  not  available  in  the  WMS  package.  For  exam¬ 
ple,  Argonne  National  Laboratory  and  Cornell  Theory  Center  inserted  the  Extensible  Argonne 
Scheduling  sYstem  (EASY),  into  LoadLeveler  to  enforce  a  back-filling  scheduling  policy  on 
the  IBM  SP-2  (Lifka,  1995;  Skovira,  1996;  Prenneis,  1996). 

•  To  experiment  with  a  research  scheduler.  For  example,  the  Distributed  Object  Computation 
Testbed  (DOCT)  at  San  Diego  Supercomputer  Center  planned  to  plug  the  research  scheduler. 
Application  Level  Scheduler  (AppLes),  developed  at  the  University  of  California,  San  Diego, 
into  the  WMS  package,  NQE  (San  Diego  Supercomputer  Center,  1997). 

e.  Support  parallel  and  interactive  jobs 

Parallel  applications  are  mostly  programmed  using  the  message-passing  model.  The  two  most 
widely  used  message-passing  libraries  are  the  Message  Passing  Interface  (MPI)  and  the  Parallel 
Virtual  Machine  (PVM).  PVM  had  been  the  de  facto  leader  before  the  message  passing  standardiza¬ 
tion  effort.  Since  the  ratification  of  MPI  as  the  industry  standard,  many  applications  using  the  PVM 
library  have  been  converted  using  MPI  calls. 

At  a  higher  level.  High  Performance  Fortran  (HPF)  is  the  standardized  extension  of  Fortran  90  that 
supports  parallel  processing.  HPF  has  been  advancing  in  scientific  and  engineering  applications. 
WMS  should  support  HPF,  if  needed. 

Support  for  interactive  jobs  are  critical  for  software  development,  especially  in  the  debugging 
stage.  Many  traditional  database  applications  also  require  interactive-processing  support.  In  addition, 
a  growing  number  of  organizations  are  making  data  available  for  Intranets,  accessing  information  by 
way  of  web  browsers  or  web-based  applications.  These  interactive  uses  are  expected  to  expand  in  the 
future. 

f.  Support  a  parallel  job  that  requires  a  specified  number  of  processors  allocated  to  each  host 

This  functionality  allows  the  most  flexible  use  of  hosts  that  are  multiprocessors.  At  one  extreme,  a 
parallel  job  may  be  scheduled  onto  a  single  multiprocessor  host  to  take  advantage  of  its  efficient 
shared  memory.  At  the  other  extreme,  it  may  be  spread  out  with  one  process  on  one  host  to  take  ad¬ 
vantage  of  aggregate  memory  or  to  swap  space  or  parallel  I/O.  However,  it  may  be  scheduled  some¬ 
where  between  the  two  cases  when  the  former  case  is  not  possible.  For  example,  in  a  cluster  con¬ 
sisting  of  single-processor  and  dual-processor  hosts,  a  parallel  job  running  on  four  processors  cannot 
be  scheduled  to  a  single  host  but  may  be  run  on  two  dual-processor  hosts. 
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g.  Provide  statistical  reports  on  system  resource  usage  by  users  on  different  hosts 

Statistical  reports  on  usage  of  system  resources  by  users  on  different  hosts  under  various  queues 
can  be  processed  from  job  logs.  Statistics  should  include  the  average,  minimum,  maximum,  and  total 
of  common  measures  such  as  number  of  jobs,  throughput,  tum-around  time,  wait  time,  CPU  time, 
user  time,  system  time,  memory,  and  swap  space.  These  reports  are  valuable  for  system  performance 
assessment,  workload  characterization,  resource  bottleneck  identification,  and  system  upgrade  plan¬ 
ning. 

h.  Provide  system  performance  prediction  and  feedback  to  scheduler 

Although  the  reports  noted  above  are  useful,  they  are  static.  Preferably,  the  WMS  should  also  pro¬ 
vide  dynamic  system-load  data  collected  for  short  time  intervals  to  support  system  performance  pre¬ 
diction  and  feedback  to  the  scheduler.  Forecasts  for  network  performance  (latency  and  bandwidth)  as 
well  as  system  performance  (CPU  usage)  for  each  host  are  of  considerable  interest  to  dynamic 
scheduling  and  developing  quality-of-service  (QoS)  guarantees  (Wolski,  1997). 

i.  Support  checkpointing,  restart,  and  job  migration 

Checkpointing  is  a  useful  means  to  save  the  state  of  a  job  (at  regular  time  intervals)  so  it  can  be  re¬ 
started  later.  Checkpointing  is  essential  for  implementing  fault  tolerance  and  load  balancing.  Indeed, 
if  the  execution  host  goes  down,  the  checkpointed  job  can  be  restarted,  on  the  same  execution  host 
when  it  becomes  available  or  on  a  different  host,  at  the  last  checkpoint  instead  of  the  beginning. 
However,  a  checkpointed  job  running  on  an  overloaded  host  can  be  restarted  on  an  idle  or  lightly 
loaded  host.  Job  migration  enhances  load  balancing. 

There  are  two  types  of  checkpointing:  system-initiated  and  user-initiated.  System-initiated  (also 
called  OS-level  or  kernel-level)  checkpointing  is  automatically  performed  when  requested  by  the 
submitted  job  without  requiring  the  code  to  be  modified  and  linked  with  any  special  library.  User- 
initiated  checkpointing  can  be  either  user-level  checkpointing  or  application-level  checkpointing.  At 
the  user  level,  application  programs  need  to  be  linked  with  checkpoint  libraries,  whereas  at  the  ap¬ 
plication  level,  programs  are  explicitly  coded  using  checkpointing  function  calls  and  need  to  be 
linked  with  checkpoint  libraries. 

Although  the  usefulness  of  checkpointing  is  well-understood,  the  current  state  of  implementation 
shows  many  limitations: 

•  Checkpointing  of  parallel  jobs  is  complicated  by  network  activity  that  causes  difficulty  in 
capturing  the  state  of  a  parallel  application.  WMS  packages  mostly  support  checkpointing  for 
serial  jobs. 

•  Checkpointing  inherently  requires  a  lot  of  disk  space  because  it  creates  a  checkpoint  file 
(snapshot  of  the  application  at  the  checkpoint  time)  that  will  be  coupled  with  the  original  ex¬ 
ecutable  to  form  a  new  user  executable  that  is  needed  when  restarting  the  job.  Checkpoint  files 
are  large,  and  writing  checkpoint  files  to  disk  not  only  creates  an  overhead  but  may  have  im¬ 
pact  on  the  file  system  performance  and  network  bandwidth.  Consequently,  the  cost  of  check¬ 
pointing  may  outweigh  the  benefit  for  (serial)  jobs  that  run  in  a  short  time. 

•  The  checkpointed  job  can  only  be  restarted  on  a  host  that  has  the  same  architecture  and  runs 
the  same  operating  system  as  the  host  on  which  the  checkpoint  was  created. 


23 


•  To  prevent  unpredictable  results,  jobs  intended  for  checkpointing  must  be  statically  linked, 
cannot  use  internal  timers,  and  should  limit  the  use  of  other  system  calls  for  signals,  forks, 
shared  memory,  semaphores,  messages,  and  threads. 

Consequently,  the  status  of  checkpoint  implementation  in  WMS  packages  suggests  that  it  is  suited 
and  useful  for  serial  jobs  that  are  long-running  and  compute-intensive. 

j.  Support  Windows  NT  and  allow  a  mixed  NT/Unix  environment 

Market  momentum  for  Windows  NT  continues  to  build,  as  shown  by  the  acceptance  of  Windows 
NT  as  the  chosen  operating  system  for  new  networked  PCs.  The  mixed  NT/Unix  environment  will 
soon  be  common,  considering  the  realization  of  programs  such  as  the  Navy’s  Information  Technol¬ 
ogy  for  the  2f‘  Century  (IT21)  initiative.  To  strategically  position  itself,  a  WMS  package  must  ex¬ 
ploit  this  heterogeneous  environment. 

3.4  CHALLENGES 

There  are  many  challenges  for  WMS  to  efficiently  support  a  large-scale,  heterogeneous,  distrib¬ 
uted  computing  environment.  In  particular,  one  can  find  several  major  challenges  from  the  list  of  the 
10  desirable  features  in  the  last  section.  Two  issues  of  immediate  importance  are: 

a.  Checkpointing.  The  goal  to  provide  universally  available  checkpoint  services  for  serial  and  par¬ 
allel  jobs  in  a  wide-area,  heterogeneous,  distributed  computing  environment  remains  elusive. 
Difficulties  arise  in  the  heterogeneity  of  the  environment.  Differences  in  vendor  implementation 
of  operating  systems,  different  architectures,  and  various  support  levels  for  user-initiated  check¬ 
pointing  make  providing  ubiquitous  checkpoint  services  a  challenging  task  (Livny  and  Raman, 
1998).  While  checkpointing  for  serial  jobs  has  made  great  advances  (Litzkow,  Tannenbaum, 
Basney,  and  Livny,  1997),  checkpointing  for  parallel  jobs  (especially  for  applications  using  the 
industry-standard  MPI)  are  still  in  the  research  stage  ^ryuhe  and  Livny,  1996). 

b.  Interoperability.  A  large-scale  distributed  computing  environment  typically  spans  several  sites 
under  different  administrative  domains.  Resources  of  these  sites  are  often  heterogeneous  and  un¬ 
der  different  WMS,  provided  mostlyby  platform  vendors.  At  present,  WMS  packages  are  not  in¬ 
teroperable.  To  interface  with  various  WMS  packages  is  a  great  challenge  to  wide-area,  hetero¬ 
geneous,  distributed  computing. 

While  checkpointing  is  the  most  difficult  goal  to  achieve  for  practical  purposes  (Livny  and  Ra¬ 
man,  1998),  recently  there  has  been  design  and  experimentation  of  tools  to  provide  workable  inter¬ 
faces  to  different  WMS  (Czajkowski,  et  al.,  1997;  Brunett  and  Fitzgerald,  1998).  The  next  section 
discusses  this  effort  and  other  recent  developments  related  to  WMS  and  wide-area,  heterogeneous, 
distributed  computing.  These  developments,  although  still  in  preliminary  stages,  are  expected  to 
have  a  profound  impact  in  the  future. 

3.5  CONCLUSIONS 

With  the  emergence  of  the  concept  of  distributed  metacomputing  and  the  advent  of  high- 
performance,  broadband  networking,  the  focus  is  on  WMS  that  is  expected  to  transform  a  collection 
of  geographically  dispersed,  administratively  separated,  heterogeneous  platforms  into  virtually  a 
single,  shared-computing  resource.  WMS  has  made  good  progress  in  recent  years.  However,  its  cur¬ 
rent  incapability  is  well-recognized.  A  short  list  of  the  10  most  desired  features  was  presented  in 
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section  3  as  a  general  guide  for  choosing  a  WMS  package.  The  following  references  provide  the 
details  for  specific  requirements. 

3.6  BIBLIOGRAPHY 

1.  LoadLeveler,  LSF,  NQE,  and  PBS  provide  up-to-date,  comprehensive  documentation  for  online 
viewing  and  download. 

•  LoadLeveler  Documentation 

Current  version  1.3,  released  in  August  1996 

http://www.rs6000.ibm.com/resource/aix_resource/sp_books/loadleveler/index.html 

•  LSF  Documentation 

Current  version  3.2,  released  in  August  1998 

http://www.platform.com/content/products/documentation/default.html 

•  NQE  Documentation 

Current  version  3.3,  released  in  March  1998 
http://www.cray.com/products/software/nqe/documentation.html 

•  PBS  Documentation 

Current  version  1.1.12,  released  in  July  1998 
http://science.nas.nasa.gov/Software/PBS/docs.html 

2.  A  seminal  book  titled  “The  Grid:  Blueprint  for  a  New  Computing  Infrastructure,”  edited  by 
Ian  Foster  and  Karl  Kesselman,  published  by  Morgan-Kaufmann  Publishers,  offers  a  good 
source  on  the  state  of  the  art  in  wide-area,  high-performance  distributed  computing.  The  ter¬ 
minology  “grid”  refers  to  the  analogy  with  electrical  power  grid  to  convey  the  notion  that 
computational  grids  have  the  potential  to  provide  access  to  computational  resources  in  a  per¬ 
vasive,  inexpensive,  and  dependable  manner. 
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4.  AGENTS,  WORKLOAD  MANAGEMENT,  AND  WiDE-AREA, 
HETEROGENEOUS,  DISTRIBUTED  COMPUTING 


4.1  INTRODUCTION 

Intelligent  agents  have  moved  from  the  abstract  notions  of  artificial  intelligence  to  many  diverse 
implementations  fostered  by  the  explosive  growth  of  the  Internet  and  electronic  commerce.  These 
software  agents  (or  simply,  agents)  are  touted  as  able  to  book  airline  tickets,  monitor  stock  quotes, 
and  perform  many  other  daily  transactions  as  well  as  to  mine  data  for  competitive  enterprises  and 
organizations.  Agents,  now  recognized  as  a  booming  field  of  distributed  computing,  provide  a  new 
framework  to  facilitate  many  services  and  applications. 

In  general,  there  are  two  types  of  agents:  stationary  (or  anchored)  agents  and  mobile  (or  transport¬ 
able)  agents.  While  a  stationary  agent  can  run  on  only  one  machine  (either  on  a  client  or  on  a  server), 
a  mobile  agent  can  roam  from  one  machine  to  another.  Agents  provide  a  framework  facilitating 
wide-area,  heterogeneous,  distributed  computing,  especially  easing  the  workload-management  task. 
In  addition,  mobile  agents  could  provide  an  efficient  environment  for  mobile  computing. 

In  this  section,  we  present  an  overview  of  agents’  roles  in  wide-area,  heterogeneous,  distributed 
computing,  and  discuss  advantages  and  challenges  of  the  emerging  technology.  The  section  is 
structured  as  follows:  in  section  4.2,  we  motivate  the  presentation  with  two  mobile-agent  distributed 
computing  models  and  give  a  comparison.  Section  4.3  discusses  recent  trends  and  developments  in 
workload-management  software.  Of  particular  interest  are  (1)  the  explicit  adoption  of  agents  in 
scheduling  at  least  at  the  research  level,  (2)  the  use  of  a  system  performance  monitoring  service  to 
facilitate  dynamic  scheduling  and  fault  detecting,  and  (3)  the  effort  to  coordinate  resources  under 
different  local  workload-management  software.  Conceptually,  agent-based  monitoring  and  manage¬ 
ment  can  help  mitigate  architectural  complexity  and  enhance  performance.  Section  4.4  discusses  the 
advantages  and  challenges  of  agent-based  approaches.  Finally,  section  4.5  presents  a  summary  and 
conclusion. 

4.2  TWO  MOBILE-AGENT  DISTRIBUTED  COMPUTING  MODELS 

To  facilitate  discussion,  we  will  describe  two  models  for  distributed  computing  using  mobile 
agents.  Both  models  are  based  on  the  client-server  paradigm  and  the  classic  Master-Slave  pattern. 
The  main  difference  between  the  two  models  is  the  degree  of  the  agents’  autonomy  with  respect  to 
mobility.  The  overall  implication  is  nontrivial  as  noted  in  comparison. 

4.2.1  Two  Models 

The  first  model  was  described  in  a  tutorial  (Sommers,  1997)  that  appeared  in  the  online  magazine, 
JavaWorld,  in  April  1997,  and  that  helped  popularize  the  use  of  mobile  agents  in  distributed  com¬ 
puting.  The  article  described  how  to  create  mobile  agents  in  Java  using  IBM’s  Aglets  Workbench 
(now  called  Aglets  Software  Development  Kit)  to  build  two  applications:  a  distributed  search  engine 
and,  more  interestingly,  a  virtual  supercomputer.  The  latter  is  described  in  this  section. 
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As  noted  in  Sommers  (1997),  the  idea  of  harvesting  the  processing  power  of  a  virtual  supercom¬ 
puter  composed  of  Internet-connected  machines  via  mobile  agents  had  been  exposed  (incidentally)  in 
another  author’s  article  (Vanhelsuwe,  1997),  also  in  JavaWorld,  2  months  earlier.  Essentially,  a  vir¬ 
tual  supercomputer  can  be  built  using  the  standard  Master-Slave  model  where  the  Master  spawns  one 
or  more  slaves  to  perform  a  specific  task.  To  facilitate  load  balancing,  a  Slave  called  LoadGatherer  is 
tasked  to  provide  the  Master  with  load  information  so  that  the  Master  can  redistribute  spawned  Slaves 
whenever  load  conditions  change  dramatically.  Therefore,  in  this  model,  besides  specifying  the  task 
performed  by  each  Slave,  the  Master  also  performs  the  function  of  workload-management  software. 

Although  in  the  article  (Sommers,  1997)  these  agents  (Master,  Slaves,  and  LoadGatherer)  were 
implemented  in  Java  (called  aglets,  abbreviation  of  agent  applets),  the  overall  architecture  is  lan¬ 
guage-independent.  Java  is  mostly  preferred  in  implementing  agent  applications  simply  to  ensure 
platform  independence. 

The  agents’  mobility  facilitates  redistributing  of  Slaves.  Furthermore,  mobility  allows  a  more  pos¬ 
sibilities  in  implementing  the  LoadGatherer,  thus  promoting  a  better  load  distribution. 

In  addition,  of  particular  interest  is  the  possibility  that  mobile  agent  technology  will  enable  the 
class  of  mobile  clients  to  participate  in  wide-area,  heterogeneous,  distributed  computing.  Devices  of 
this  class,  ranging  from  laptop  and  notebook  (as  well  as  the  forthcoming  palm-sized)  computers  to 
handheld  personal  digital  assistants  (PDAs),  can  get  relevant  results  by  launching  mobile  agents  to 
perform  necessary  computation  and  filtering  on  high-performance  platforms  (Chess  et  al.,  1995; 
Chess  et  al.,  1994). 

This  mobility  of  agents,  however,  can  also  create  security  threats  since  a  client  can  implant  mali¬ 
cious  tasks  into  agents  or  change  their  states.  Security  concerns  have  been  recognized  in  deploying 
mobile-agent  applications  (for  example,  see  Chess  et  al.  [1994])  and  Java-based  mobile-agent  appli¬ 
cations  (for  example,  see  Karjoth,  Lange,  and  Oshima  [1997]).  Many  security  issues  have  not  been 
resolved  and  security  mechanisms  are  among  currently  active  research  topics.  For  example,  while 
authenticators  (algorithms  to  determine  an  agent’s  authenticity)  for  the  one-hop  mode  of  mobility  are 
well  studied  and  understood,  the  multi-hop  authentication  has  not  been  solved.  This  complicates  the 
current  security  situation  because  the  agent  transfer  (also  called  agent  transport)  is  considered  a 
muti-hop  operation  (Crystaliz,  General  Magic,  GMD  FOKUS,  IBM,  and  the  Open  Group,  1997).  In 
a  closed  distributed  computing  environment,  the  principal  concern  is  not  much  on  malicious  attacks 
but  mainly  on  accidents  that  resulted  from  errors  when  deploying  mobile-agent-enabled  software. 
Note  that  effects  of  accidental  security  violations,  not  different  from  intentional  ones,  could  be  det¬ 
rimental  to  the  safety  and  integrity  of  system  resources  including  databases. 

Since  the  LoadGatherer  checks  workload  conditions  and  reports  them  to  the  Master,  it  does  not 
contribute  to  the  overall-processing  task  performed  by  other  Slaves.  Another  alternative  to  this 
model  is  to  abolish  the  LoadGatherer.  Functionally,  all  agents  will  set  out  to  find  resources  at  run¬ 
time  to  perform  their  specified  tasks.  In  this  case,  the  Master,  no  longer  engaging  in  the  workload- 
management  task,  participates  in  the  overall  processing  and,  if  necessary,  coordinates  with  Slaves  for 
gathering  result  and  I/O  task. 

In  a  mobile-agent-based  system,  a  mechanism  for  dynamic  measurement  of  resource  and  network 
utilization,  and  for  dissemination  of  information  to  agents,  is  needed.  Above  all,  cooperation  be¬ 
tween  agents  is  essential  to  ensure  that  agents  are  migrating  efficently  from  heavily  loaded  to  lightly 
loaded  nodes.  Without  cooperation,  mass  migration  of  agents  ever-seeking  rescheduling  could  result 
in  chaotic  and  dangerous  behavior  (“thrashing”)  as  well  as  waste  of  resources. 
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One  experimental  approach  based  on  microeconomic  models  has  been  explored  in  Bredin,  Kotz, 
and  Rus  (1998a)  and  Bredin,  Kotz,  and  Rus  (1998b).  In  this  approach,  each  resource  has  a  resource 
manager  that  sets  the  price  for  the  resource  under  its  control.  Pricing  is  fixed  at  initialization  and  is 
dynamically  changing  to  adapt  to  supply-demand  conditions  of  resources  in  the  systems.  One  rele¬ 
vant  model  is  seller-driven,  characterized  by  many  buyers  competing  for  the  same  sources.  Each 
agent  has  a  pre-allocated  budget  for  bidding.  In  addition,  there  is  a  model  for  option  trading  to 
counter  market  fluctuations. 

A  more  traditional  approach  to  an  adaptive  placement  framework  was  recently  described  in  Keren 
and  Barak  (1998).  The  load  index  of  a  node,  based  on  the  node’s  utilization,  is  communicated  among 
agents  so  that  at  each  time  unit,  each  agent  has  information  about  a  subset  of  nodes  in  the  system. 

The  migration  decision  is  executed  periodically  by  each  agent.  When  the  expected  performance  gain 
is  above  a  predetermined  threshold,  migration  follows  after  a  reservation  message  is  communicated 
to  prevent  concurrent  migration. 

A  more  sophisticated  facility  to  provide  dynamic  forecasts  of  network  and  resource  load  condi¬ 
tions  can  be  incorporated.  The  agent-based  system,  AppLeS,  the  application-level  scheduler,  uses  a 
separate  facility  called  the  Network  Weather  Service  (NWS)  that  parameterizes  performance  models 
to  predict  the  state  of  resources  at  the  time  the  application  will  be  scheduled.  Besides  the  usual  re¬ 
source-availability  parameters,  NWS  also  includes  network-performance  parameters  such  as  band¬ 
width  and  latency.  Both  AppLeS  and  NWS  have  been  developed  at  the  University  of  California,  San 
Diego  (Berman,  1998;  Berman  and  Wolski,  1997;  Berman  et  al.,  1996;  Spring  and  Wolski,  1998). 

4.2.2  Comparison 

For  convenience,  we  denote  the  first  mobile-agent  distributed  computing  model  (Master-Slave- 
LoadGatherer),  Model  A,  and  the  second  model  (Master-Slave),  Model  B.  The  implications  of  the 
degree  of  autonomy  variation  granted  to  the  agents  in  Models  A  and  B  can  be  summarized  as  fol¬ 
lows: 

•  Model  B  has  the  advantage  of  avoiding  resource  allocation  for  the  tasks  of  gathering  and  re¬ 
porting  load  conditions  and  workload  management  required  in  Model  A.  However,  these 
services  must  be  assumed  by  individual  agents  or  provided  as  part  of  the  infrastructure.  With¬ 
out  these  services,  the  possible  performance  gain  might  not  be  realized  but  performance  deg¬ 
radation  might  happen  to  the  overall  task. 

•  Services  leading  to  self-scheduling  in  Model  B  could  be  streamlined  using  fewer  features  sup¬ 
ported  by  a  reduced  set  of  resource  and  network  parameters  so  that  each  agent  can  adapt  with 
ease.  This  is  workable  for  a  small  and  homogeneous  distributed  computing  environment.  For  a 
large-scale,  heterogeneous  environment,  simplifying  resource  and  network  parameters  will  ig¬ 
nore  many  complexities  and  result  in  performance  problems.  Much  of  the  difficulty  is  caused 
by  the  heterogeneity  of  resources  and  the  impact  of  variations  in  deliverable  resource  perform¬ 
ance  because  of  the  contention  of  shared  resources  (Berman,  1998). 

4.3  RECENT  TRENDS  AND  DEVELOPMENTS  IN  WORKLOAD-MANAGEMENT  SOFTWARE 

Perhaps  the  most  noticeable  trend  in  WMS  is  the  emergence  of  the  use  of  agents  at  least  at  the  re¬ 
search  level.  Clearly,  widely  used  commercial  WMS  packages  such  as  LoadLeveler,  LSF,  and  NQE, 
that  were  designed  and  implemented  before  the  popularity  of  agent  technologies,  have  many  compo¬ 
nents  that  can  now  be  articulated  as,  in  fact,  intelligent  agents.  In  this  section,  we  present  two  re- 
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search  projects  explicitly  designed  with  the  use  of  agents  for  resource  allocation  and  scheduling,  one 
at  the  system  level  and  the  other  at  the  application  level.  Another  trend  is  the  use  of  a  system  moni¬ 
toring  service  aimed  at  improving  dynamic  scheduling  and  enhancing  fault-detecting  capability.  In 
addition,  there  is  an  effort  to  coordinate  resources  under  different  WMS  (such  as  LoadLeveler,  LSF, 
and  NQE).  Since  these  local  WMS  packages  are  not  interoperable,  this  endeavor  is  of  utmost  im¬ 
portance  for  creating  a  wide-area  metacomputing  environment. 

4.3.1  Agent-based  Scheduling 

Agent-based  scheduling  can  provide  robust  and  high-performance  application  execution  in  a 
highly  distributed  computing  environment.  Both  agent-based  systems  described  in  this  section  are 
stationary  and  the  agents  are  performing  dynamic  scheduling. 

One  of  the  examples  of  agent-based  scheduling  is  the  NetSolve  System  developed  jointly  at  the 
University  of  Tennessee  and  the  Oak  Ridge  National  Laboratory  (Casanova  and  Dongara,  1998; 
Casanova,  Dongarra,  and  Seymour,  1998;  Casanova  and  Dongarra,  1997a;  Casanova  and  Dongarra, 
1997b).  NetSolve  is  a  client-server  application  that  enables  users  to  solve  complex  scientific  prob¬ 
lems  using  computational  resources  on  the  Internet,  or  on  an  Intranet,  or  by  belonging  to  a  research 
group.  Numerical  software  currently  available  in  the  NetSolve  system  includes  ARPACK, 
FFTPACK,  LAPACK,  BLAS,  Minpack,  QMR,  and  the  parallel  library,  ScaLAPACK.  Many  inter¬ 
faces  to  Fortran,  C,  Matlab,  and  Java  have  been  designed  and  implemented  to  help  users  access 
hardware  and  software  resources  in  the  NetSolve  system  from  their  program.  There  is  also  a  Java 
graphical  user  interface  (GUI)  based  on  Java  Development  Kit  (JDK)  1.1  for  user-friendly  interfac¬ 
ing. 

The  NetSolve  system  uses  a  remote  computing  method  where  data  from  the  client  (user’s  ma¬ 
chine)  is  sent  to  the  server  for  execution,  and  the  result  is  sent  back  to  the  client.  The  NetSolve  agent 
performs  the  task  of  a  resource  broker  for  the  clients.  It  takes  the  request  from  a  client  and  makes  the 
choice  of  resources  in  the  NetSolve  systems  for  execution  of  the  client’s  program  based  on  compu¬ 
tation-specific  and  resource-specific  information  (Casanova  and  Dongarra,  1997a). 

In  addition,  the  NetSolve  agent  performs  load  balancing  and  fault  tolerance  (Casanova  and  Don¬ 
garra,  1997a).  For  load  balancing,  the  NetSolve  agent  estimates  the  execution  time  of  the  client’s 
problem  on  every  machine  in  the  NetSolve  system  and  chooses  the  one(s)  of  smallest  time.  The  exe¬ 
cution  time  estimate  is  based  on  a  performance  model  that  considers  (1)  client-dependent  parameters 
(such  as  the  size  of  data  to  send,  the  size  of  result  to  be  received,  and  the  size  of  the  problem);  (2) 
static  server-dependent  parameters  (such  as  the  raw  performance  of  servers,  the  complexity  of  algo¬ 
rithm,  and  network  characteristics  between  the  client  and  servers);  and  (3)  dynamic  server-dependent 
parameters  (such  as  workload).  Note  that  network  characteristics  (such  as  bandwidth  and  latency) 
between  a  client  and  a  server  is  assumed  stable  (Casanova  and  Dongarra,  1997b). 

The  NetSolve  agent  uses  two  mechanisms  to  promote  fault  tolerance.  First,  NetSolve  maintains  a 
client-server  protocol  so  that  whenever  the  NetSolve  agent  receives  a  request  for  a  problem  to  be 
solved,  it  sends  back  a  list  of  computational  servers  sorted  from  the  most  to  the  least  suitable  one. 
The  client  tries  all  the  servers  in  sequence  until  one  accepts  the  problem  (that  is,  establishes  the  con¬ 
nection).  If  no  connection  is  received,  the  client  will  receive  a  different  list  or  no  server  remains. 
Secondly,  once  the  connection  is  established,  if  the  server  dies  for  some  reason,  the  problem  is  mi¬ 
grated  to  another  possible  server  until  it  is  solved  or  no  server  remains  (Casanova  and  Dongarra, 
1997b). 
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Note  that  there  are  multiple  instances  of  the  NetSolve  agent  on  the  network,  especially  when  the 
set  of  computational  resources  spans  several  local-area  networks  (Casanova,  Dongarra,  and 
Seymour,  1998).  In  this  sense,  NetSolve  can  be  characterized  as  a  multi-agent  system. 

Another  agent-based  scheduler  is  the  Application-Level  Scheduler  (AppLeS),  developed  at  the 
University  of  California,  San  Diego.  While  NetSolve  schedules  requests  (and  provides  access  to 
numerical  software),  AppLeS  schedules  stand-alone  parallel  distributed  applications.  For  each  appli¬ 
cation,  AppLeS  has  a  single  active  agent  coordinating  four  subsystems.  One  of  the  subsystems  is  the 
NWS  that  provides  performance  monitoring  and  predictive  functionality  to  the  AppLeS  agent.  The 
other  subsystems  are  Selector  (selecting  resources).  Planner  (generating  a  resource-dependent  sched¬ 
ule),  and  Actuator  (implementing  the  “best”  schedule).  The  AppLeS  agent  uses  (1)  user  preferences 
(for  machines,  libraries,  performance  measure),  and  (2)  application-specific  and  dynamic  informa¬ 
tion  to  determine  a  performance-efficient  schedule.  In  the  default  scheduling  policy,  the  schedule 
given  by  AppLeS  is  the  “best”  of  the  schedules  determined;  however,  the  user  can  override  it  by 
using  his/her  own  scheduling  policy.  Details  of  the  AppLeS  design  can  be  found  in  Berman  (1998), 
Berman  and  Wolski  (1997),  and  Berman  et  al.  (1996). 

Although  the  agent-based  AppLeS  scheduler  is  still  in  the  development  stage,  it  has  generated  in¬ 
teresting  lessons  learned  from  each  new  application  (Spring  and  Wolski,  1998).  There  is  a  proposed 
collaborating  effort  between  NetSolve  and  AppLeS  to  further  improve  the  performance  of  the  Net¬ 
Solve  system  (Berman,  1998). 

Overall,  agent-based  programming  has  implication  on  WMS.  WMS  can  be  agent-based,  that  is, 
WMS  functionality  can  be  performed  by  system-level  agents.  This  is  the  case  of  the  NetSolve 
scheduling  agent(s)  that  can  provide  access  to  network  resources  for  numerical-computing  execution 
in  a  grid  environment.  However,  the  scheduling  functionality  is  assumed  by  application-level 
agent(s)  for  the  user’s  application.  In  the  Application-Level  Scheduler  (AppLeS),  the  scheduling 
agent  is  a  stationary  agent  that  coordinates  four  subsystems  to  achieve  the  best  scheduling.  As  de¬ 
scribed  in  section  4.2.1,  other  examples  are  mobile-agent  models  that  either  let  mobile  agents 
autonomously  seek  resources  and  migrate  by  themselves,  or  use  model-based  mechanisms  (for  ex¬ 
ample,  a  microeconomic,  market-oriented  approach)  for  resource  control.  In  general,  to  achieve  effi¬ 
cient  end-to-end  resource  utilization,  it  requires  coordination  between  system-level  scheduling  and 
application-level  scheduling  where  an  application-level  scheduler  can  specify  the  application’s  per¬ 
formance  requirements. 

4.3.2  System  Performance  Monitoring 

Two  immediate  benefits  of  monitoring  system  performance  are  to  dynamically  forecast  network 
and  computational  resource  load  and  availability,  and  to  detect  faults.  The  two  monitoring  services 
currently  available  are  the  Network  Weather  Service  (Wolski,  1997)  that  is  used  in  AppLeS  (Ber¬ 
man,  1998),  and  the  HeartBeat  Monitor  (Sterling  et  al.,  1998),  a  service  in  the  Globus  system,  that 
has  been  integrated  into  NetSolve  as  an  option  for  an  application-specific  restart-based  fault  recov¬ 
ery  mechanism  (Casanova,  Dongarra,  and  Seymour,  1998). 

The  Network  Weather  Service  (NWS)  facility  provides  AppLeS  with  dynamic  forecast  of  resource 
load  and  availability.  As  noted  above,  bandwidth  and  latency  are  assumed  stable  and  treated  as  static 
parameters  in  the  NetSolve  system.  The  NWS,  however,  considers  them  dynamic  parameters,  and 
periodically  monitors  and  dynamically  forecasts  the  performance  of  network  and  computational  re¬ 
sources  in  terms  of  these  and  other  parameters.  The  current  methods  used  in  forecasting  are  mean- 
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based,  median-based,  and  autoregressive.  The  system  tracks  the  accuracy  of  these  three  predictors 
and  uses  the  one  of  lowest  cumulative  error  at  any  given  time  to  generate  a  forecast  (Wolski,  1997). 

While  the  network  sensory  subsystem  implementation  of  the  NWS  is  a  modification  of  the  well- 
known  network  performance  probing  utility,  netperf,  the  Globus  HeartBeat  Monitor  (HBM)  was  a 
new  design.  Essentially,  the  HBM  has  two  components.  One  is  the  client  registration  API  that  allows 
a  process  to  register  with  the  HBM,  and  then  expects  to  receive  regular  heartbeat  from  the  process. 
The  other  component  is  the  data  collection  API  that  allows  another  process  to  get  information  about 
the  status  of  a  registered  process.  Developed  as  a  part  of  core  Globus  services,  HBM  checks  the 
status  of  other  core  services  such  as  resource  allocation  manager,  directory  service,  and  implements 
application-specific,  fault-recovery  strategies  (Sterling  et  al.,  1998). 

Conceptually,  system-performance  monitoring  can  be  implemented  as  an  agent-based  service.  In 
particular,  the  monitoring  service  can  be  decomposed  into  tasks  performed  by  autonomous,  adapt¬ 
able,  agent-based  monitors  that  communicate  constantly  with  peers  and  resources  through  an  agent- 
oriented  communication  mechanism  (Johnston,  1998). 

4.3.3  Interoperability 

Typically  a  large-scale  distributed  computing  environment  is  heterogeneous  with  resources  geo¬ 
graphically  dispersed,  and  local  sites  are  often  under  different  workload-management  software  pack¬ 
ages.  Frequently,  WMS  packages  are  bundled  with  platforms  from  hardware  vendors.  For  example, 
Cray  and  SGI  platforms  use  NQE;  IBM  platforms  (RS-6000  and  SP-2)  use  LoadLeveler  or 
LoadLeveler  with  the  EASY  scheduler;  and  Sun  and  others  use  LSF  (which  has  been  gaining  market 
share  because  of  its  wide  coverage  of  supported  platforms).  Since  these  local  resource-allocation 
mechanisms  do  not  interoperate,  there  is  a  serious  problem  in  resource  management  and  scheduling 
for  wide-area  distributed  computing. 

The  ideal  way  to  solve  this  problem  is  to  standardize  resource  management  and  scheduling  APIs 
as  explored  in  the  PSCHED  initiative  (PSCHED,  1997).  This  effort  has  been  inactive  for  more  than  a 
year,  and  there  is  no  sign  that  the  proposed  APIs  will  ever  be  completed  because  of  the  main  ven¬ 
dors’  lack  of  interest.  A  less  than  ideal  but  possible  solution  is  to  devise  higher  level  resource  alloca¬ 
tion  managers  that  interface  with  these  (and  other)  local  resource  allocators.  The  Globus  Resource 
Allocation  Managers  (GRAMs),  a  core  service  of  the  Globus  toolkit,  uses  this  method  to  map  the 
resource  specification  into  a  request  to  some  local  resource  allocators.  Currently,  GRAM  can  inter¬ 
face  with  six  different  resource  allocation  mechanisms;  LoadLeveler,  LoadLeveler  with  EASY 
scheduler,  LSF,  NQE,  Condor,  and  a  simple  “fork”  daemon  (Czajkowski  et  al.,  1997). 

An  extensible  resource  specification  language  (RSL),  based  on  the  syntax  for  filter  specification  in 
the  Lightweight  Directory  Access  Protocol  (LDAP),  communicates  the  request  for  resources  be¬ 
tween  components  of  the  Globus  resource  management  architecture.  To  facilitate  access  to  informa¬ 
tion  on  the  current  availability  and  capability  of  resources,  the  Metacomputing  Directory  Service 
provides  a  suite  of  tools  and  APIs  based  on  LDAP  to  facilitate  access  to  information  on  the  current 
availability  and  capability  of  resources  (Czajkowski  et  al.,  1997). 

Conceptually,  the  task  to  interface  local  WMS  can  be  performed  by  agent(s)  to  mitigate  architec¬ 
tural  complexity.  In  particular,  GRAMs  as  well  as  other  subsystems  of  the  Globus  resource- 
management  architecture  can  be  agent-based.  Again,  an  effective  mechanism  for  agent  communica¬ 
tions  is  required. 
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The  overall  Globus  resource-management  architecture  were  implemented  and  deployed  on 
GUSTO,  a  large  computational  grid  testbed  comprised  of  15  sites  and  over  330  computers  using 
LSF,  NQE,  LoadLeveler,  LoadLeveler  with  EASY,  and  Condor  as  local  WMS  (Czajkowski  et  al., 
1997).  The  DARPA-sponsored  Synthetic  Forces  (SF)  Express  project  is  one  defense-related  research 
application  that  has  experimented  with  the  Globus  toolkit.  In  Mach  1998,  at  the  Technology  Area 
Review  Assessment  (TARA)  briefing  in  San  Diego,  the  SF  Express  project  showed  a  distributed, 

*  parallel  implementation  of  the  widely  used  Modular  Semi-Automated  Forces  (ModSAF)  Distributed 
Interactive  Simulation  with  lOOK  entities,  spanning  nine  sites  and  over  1300  processors.  This  dem¬ 
onstration  set  a  new  record  with  respect  to  the  number  of  entities  in  Distributed  Interactive  Simula- 

’  tion  (Brunett  and  Fitzgerald,  1998) 

4.4  ADVANTAGES/CHALLENGES  OF  AGENT  APPROACH 

Perhaps  the  most  far-reaching  advantage  of  agents  is  in  the  intelligent  integration  of  information. 
Individuals  in  a  modem  society  are  facing  ever-expanding  resources  that  lead  to  information  over¬ 
load.  The  same  situation  occurs  in  large-scale  distributed  computing.  The  trend  is  that  the  computing 
environments  are  getting  wider,  encompassing  multiple  administrative  domains  having  many  het¬ 
erogeneous  resources  (including  information  resources).  Practical  advantages  of  agents  are  at  both 
application  and  system  levels.  At  the  application  level,  agents  help  get  optimal  performance  by  using 
the  best-suited  resources  available  in  the  system.  At  the  system  level,  agents  promote  load  balancing. 

Addtionedly,  it  is  believed  that  mobile  agents  reduce  network  load.  In  traditional  client-server  dis¬ 
tributed  systems,  communications  protocols  often  need  multiple  interactions  to  complete  a  particular 
task,  especially  when  large  volumes  of  data  are  located  remotely.  Mobile  agents  allow  the  processing 
to  be  completed  where  data  are  located  by  migrating  code  to  data.  Another  consequence  of  code 
mobility  is  that  mobile  agents  overcome  network  latency. 

Mobile  agents  also  have  important  implications  on  the  class  of  mobile  platforms.  On  one  hand, 
these  platforms  usually  have  limited  processing  capacities  due  to  relatively  slow  processors  and  lim¬ 
ited  memory  as  well  as  limited-capability  software  and  environments.  On  the  other  hand,  they 
mostly  operate  under  unforgiving  network  conditions.  In  particular,  their  connections  often  have  low 
bandwidth  and  high  latency,  and  are  prone  to  failure  because  of  inadvertent  interference.  Moreover, 
they  do  not  have  permanent  connections  into  a  network,  and  mostly,  they  are  disconnected  for  a  long 
period.  Mobile  agents  are  attractive  for  mobile  platforms  because  they  migrate  and  access  resources 
in  the  network,  and  thus,  do  not  need  constant  connection  to  the  home  platforms.  Upon  completing 
their  tasks,  mobile  agents  send  results  back  to  the  mobile  platforms.  If  a  mobile  platform  is  discon¬ 
nected,  results  are  temporary  stored  on  a  docking  mechanism  that  re-sends  when  connection  is  re¬ 
established  (Chess  et  al.,  1995;  Gray,  Kotz,  Nog,  Rus,  and  Cybenko,  1996). 

i 

As  in  the  case  of  many  emerging  technologies,  there  is  a  large  gap  between  expectations  for 
agents  (in  general)  and  mobile  agents  (in  particular)  and  the  available  technology.  Many  technical 

*  challenges  to  agents  (in  general)  and  mobile  agents  (in  particular)  remain.  First,  agents  need  policy 
and  mechanisms  for  cooperation.  Without  cooperation,  an  agent-based  system  is  very  likely  subject 
to  chaotic  and  dangerous  behavior  as  well  as  waste  of  resources.  The  “thrashing”  situation  could 
happen  in  an  agent-based  system,  and  would  be  more  likely  in  a  mobile-agent-based  system  (Ber¬ 
man,  1998). 

Second,  agent-based  systems  need  security  mechanisms,  especially  for  mobile  agents  that  are  able 
to  roam  from  one  machine  to  another.  Security  measures  are  needed  to  protect  machines  from  agents 
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and  agents  from  machines — against  malicious  and  inadvertent  attacks.  Two  techniques  under  inves¬ 
tigation  are  code  appraisal  and  state  appraisal.  Code  appraisal  aims  to  provide  programming  lan¬ 
guage  support  to  improve  mobile  code  safety.  Developing  a  safe  language  for  mobile  agents  or  mo¬ 
bile  code  can  accomplish  this  goal.  Another  possibility  is  using  proof-carrying  code  that  packages  a 
proof  together  with  the  agent,  and  for  the  host  to  check  the  proofs  validity.  Because  a  migrating 
agent  can  become  malicious  if  its  state  is  altered  or  corrupted,  state  appraisal  is  also  crucial  to  sys¬ 
tem  security  (Swamp,  1997;  Farmer,  Guttman,  and  Swamp,  1996).  ■* 

Third,  there  is  a  great  need  for  agent  interoperability.  In  particular,  agent-based  systems  need 
standardized  methods  for  common  operations  including  agent  creation,  deletion,  and,  for  mobile  ^ 

agents,  transfer  across  the  network.  Currently,  the  Mobile  Agent  Facility  (MAF)  Specification, 
drafted  by  IBM,  General  Magic,  Crystaliz,  and  GMD  FOKUS,  with  Open  Group  support,  has  been 
jointly  submitted  to  the  Object  Management  Group  (OMG)  for  consideration  as  an  industry  standard 
(Crystaliz,  General  Magic,  GMD  FOKUS,  IBM,  and  the  Open  Group,  1997).  Standardization  efforts 
are  needed  in  other  areas  such  as  communication  protocols  and  system  integration. 

Finally,  the  agent  technology  will  benefit  from  efforts  to  provide  cost-effective  agent-development 
inffastracture  including  languages,  tools,  and  environments.  Wide-area,  heterogeneous,  distributed 
computing  will,  in  turn,  benefit  from  the  agent  technology.  In  particular,  tools  are  needed  to  create 
resource-  and  network-aware  agent-based  applications.  These  applications  must  be  adapted  to  dy¬ 
namic  system  state  and  be  able  to  assess  the  performance  impact  and  select  potential  platforms.  De¬ 
velopment  environments  supporting  debugging  and  environment  tuning  for  agent-based  applications 
are  crucial  because  of  the  resource  heterogeneity.  In  a  wide-area,  heterogeneous,  distributed  com¬ 
puting  environment,  the  other  users’  impact  on  shared  resources  and  system  asynchrony  make  diag¬ 
nostics  and  tuning  extraordinarily  difficult  (Berman,  1998). 

4.5  CONCLUSION 

In  this  chapter,  we  present  several  fundamental  developments  in  workload  management  in  wide- 
area  distributed  computing.  As  discussed  in  section  3,  WMS  interoperability  is  crucial  when  re¬ 
sources  are  under  different  WMS.  To  mitigate  the  lack  of  WMS  interoperability  packages,  one  ap¬ 
proach  is  to  devise  high-level  interfaces  to  coordinate  with  local  WMS  packages.  This  approach  was 
adopted  in  the  design  of  a  resource-management  architecture  of  the  Globus  toolkit  for  wide-area, 
heterogeneous,  distributed  computing.  Another  development  is  the  use  of  system-performance 
monitoring  services  to  facilitate  dynamic  scheduling  and  fault  detection. 

It  is  believed  that  the  rapidly  growing  agent  technology  is  well  suited  to  wide-area,  heterogeneous, 
distributed  computing  because  it  can  help  enhance  design  and  performance.  In  particular,  agent¬ 
programming  models  can  offer  scheduling  at  the  system  level  as  well  as  the  application  level.  Over-  ^ 

all,  to  achieve  efficient  end-to-end  resource  utilization,  it  needs  coordination  between  system-level 
scheduling  and  application-level  scheduling  where  an  application-level  scheduler  can  specify  the 
application’s  specific  performance  requirements.  Furthermore,  mobile  agents  provide  better  support  ^ 

for  mobile  computing.  Mobile  platforms — such  as  laptop,  notebook,  and  palm-sized  computers  as 
well  as  handheld  personal  digital  assistants  (PDAs) — are  characterized  by  limited  storage  and  proc¬ 
essing  capacity,  limited-capacity  software  and  runtime  environment,  low-bandwidth  high-latency 
connection,  and  long-period  disconnection.  Mobile  agents  enable  this  class  of  clients  to  get  relevant 
results  where  the  necessary  computation  and  filtering  are  performed  on  high-performance  servers 
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Many  technical  challenges  to  the  agent  technology  remain.  Several  main  issues  have  been  pre¬ 
sented  in  this  section: 

•  Many  security  issues  on  deploying  mobile-agent  applications  as  well  as  Java-based  mobile- 
agent  applications  have  not  been  resolved  (Chess  et  al.,  1994;  Kaijoth,  Lange,  and  Oshima, 
1997).  In  particular,  while  the  agent  transfer  is  considered  a  multi-hop  operation,  the  current 
security  technology  has  not  solved  the  multi-hop-authentication  problem  (Crystaliz,  General 
Magic,  GMD  FOKUS,  IBM,  and  the  Open  Group,  1997).  In  a  closed  distributed  computing  en¬ 
vironment,  the  principal  concern  is  not  much  on  malicious  attacks  but  mainly  on  accidents  that 
resulted  from  errors  (for  example,  software  bugs)  when  deploying  mobile-agent-enabled  soft¬ 
ware.  Note  that  the  effects  of  accidental  security  violations,  not  different  from  intentional  ones, 
could  be  detrimental  to  the  safety  and  integrity  of  system  resources  including  databases.  In¬ 
deed,  it  was  recommended  that  “substantial  investment  in  mobile  agent  systems  may  await 
further  work  on  new  security  techniques  oriented  toward  mobile  agents'”  (Swamp,  1996). 

•  Agent  technology  urgently  needs  a  standardized  specification.  The  wide  differences  in  archi¬ 
tecture  and  implementation  among  agent  and  mobile-agent  systems  currently  available  prevent 
interoperability,  and  thus,  serious  adoption.  The  Mobile  Agent  Facility  (MAF)  Specification, 
drafted  by  IBM,  General  Magic,  Crystaliz,  and  GMD  FOKUS,  with  Open  Group  support,  was 
jointly  submitted  to  the  Object  Management  Group  (OMG)  for  consideration.  This  effort  cov¬ 
ers  common  agent  operations  including  agent  creation,  termination,  and  transfer.  Mobile  agents 
are,  as  noted  in  the  current  MAF  draft,  “a  relatively  new  technology  that  is  fueling  a  new  in¬ 
dustry”  (Crystaliz,  General  Magic,  GMD  FOKUS,  IBM,  and  the  Open  Group,  1997).  Conse¬ 
quently,  the  sooner  the  specification  is  finalized,  the  better  the  chance  of  a  sustained  growth. 

•  Besides  the  security  limitations  of  mobile  agents,  knowledge  representation  that  facilitates 
agents  in  communications  with  other  agents  is  another  important  issue  that  needs  resolution  to 
ensure  a  viable  (Kiniry  and  Zimmerman,  1997).  Current  technologies  (such  as  those  from  the 
DARPA-sponsored  Knowledge  Sharing  Effort)  are  highly  complex  for  integration  into  agent 
systems.  Furthermore,  present  agent  communication  mechanisms  are  extremely  limited  in 
flexibility  and  possibilities  (Kiniry  and  Zimmerman,  1997).  Knowledge  representation  and 
communication  enable  applications  in  collaboration,  intelligent  planning  and  scheduling,  elec¬ 
tronic  commerce,  and  many  other  areas. 

•  Agent  technology  is  still  in  the  early  developmental  stage.  For  workload  management  of  a 
large-scale,  heterogeneous,  distributed  computing  environment,  it  can  build  a  tools  leveraging, 
well-developed  infrastructure,  as  in  the  case  of  research-oriented  projects  such  as  the  agent- 
based  scheduler,  AppLeS  (together  with  the  resource  and  network  monitor  and  predictor, 
NWS).  In  general  applications  as  well  as  in  workload  management  applications,  there  has  been 
a  pervasive  emphasis  on  the  agent  “smartness.”  However,  cooperation  and  adaptive  learning 
are  critical  to  building  powerful  applications.  Indeed,  “individual  agents  needn’t  be  so  smart  to 
function  collectively  in  a  complex  and  useful  manner — just  like  an  ant  colony — and  that  agents 
can  learn  from  one  another”  (Petrie,  1997). 

•  Programming  engineering  and  scientific  applications  using  agents  (and  mobile  agents)  have 
drawbacks  similar  to  using  the  message-passing  paradigm,  except  that  the  Message-Passing 
Interface  (MPI)  standardized  the  latter.  In  particular,  agent  programs  are  generally  difficult  to 
write  and  exceptionally  difficult  to  debug.  For  example,  in  the  Master-Slave  pattern,  it  requires 
that  the  program’s  data  structures  and  the  entire  application  to  be  explicitly  partitioned  so  each 
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Slave  (here,  a  mobile  agent)  can  perform  its  own  specified  subtask.  Further  complications  are 
in  the  communications  among  Slaves  that  require  synchronization.  Except  for  coarse-grained 
tasks  that  are  inherently  data-independent  or  those  that  can  decompose  easily  and  do  not  need 
much  communication,  programming  with  agents  is  a  complex  endeavor,  not  unlike  low-level 
language  programming.  Currently,  agent  technology  is  not  mature  enough  for  processing  real- 
life,  mission-critical  tasks  in  a  wide-area,  heterogeneous,  distributed  environment.  In  the  near- 
term,  agent  technology  will  benefit  from  a  cost-effective  agent-development  infrastructure  in¬ 
cluding  languages,  tools,  and  environments. 

•  The  DARPA  Information  Systems  Office  (ISO)  is  sponsoring  projects  under  control  of  the 
Agent-Based  Systems  (CoAABS)  Program  to  develop  and  demonstrate  techniques  to  safely 
control,  coordinate,  and  manage  large  systems  of  agents.  The  ISO  is  also  funding  the  Command 
Post  of  the  Future  (CPOF)  Program  where  agent  technology  is  one  of  the  many  candidates  un¬ 
der  assessment.  More  information  can  be  found  at  the  respective  web  pages  listed  in  the  General 
References  section. 

4.6  BIBLIOGRAPHY 

IEEE  Internet  Computing  frequently  published  articles  on  agent-based  applications  and  infrastruc¬ 
ture.  In  the  special  issue  on  “Internet-based  Agents”  (Vol.  1,  No.  4,  July-August  1997)  there  are 
many  pointers  to  resources  for  agents  and  Java-based  agents.  The  magazine  has  a  regular  column 
titled  “Agents  on  the  Web”. 

IEEE  Internet  Computing: 
http;//www.computer.org/intemet/ 

IBM’s  Aglets  Workbench,  now  called  Aglets  Software  Development  Kit  (ASDK),  is  available  as 
free  software  from  IBM  Tokyo  Research  Laboratory.  It  is  a  first-of-its-kind  visual  environment 
for  creating  mobile-agent-based  applications  in  Java.  Software  and  documentation  are  available 
at 


IBM  Aglets  Software  Development  Kit: 
http://www.trl.ibm.co.jp/aglets/ 

The  DARPA  Information  Systems  Office  (ISO)  has  several  agent-related  programs,  including 
Control  of  Agent-Based  Systems: 
http;//dtsn.darpa.mil/iso/programtemp.asp?mode=126 
Command  Post  of  the  Future: 
http://www-code44.spawar.navy.mil/cpof/ 

DARPA  sponsored  the  three-day  Workshop  on  Foundations  for  Secure  Mobile  Code  in  March  1997 
at  Naval  Post-graduate  School.  Position  papers  presented  at  the  workshop  are  available  at 

Workshop  on  Foundations  for  Secure  Mobile  Code: 

http://www.cs.nps.navy.mil/languages/statements/ 
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The  ACM  1998  Workshop  on  Java  for  High-Performance  Network  Computing  was  held  on  the  cam¬ 
pus  of  Stanford  University,  Palo  Alto,  CA,  February  28-March  1,  1998.  This  is  the  third  of  a  se¬ 
ries  of  workshop  started  in  1996  to  explore  the  use  of  the  Java  programming  language  for  high- 
performance  network  computing,  and  scientific  and  engineering  computing.  Proceedings  of  the 
1998  workshop  are  available  online  at 

*  ACM  1998  Workshop  on  Java  for  High-Performance  Network  Computing: 

http://www.cs.ucsb.edu/conferences/java98/ 

^  Globus  is  an  integrated  software  system  for  distributed  parallel  computing  intended  to  create  wide- 

area  virtual  computers.  Recognizing  that  virtual  computers  must  support  a  variety  of  applications 
and  programming  models,  the  Globus  system  adopts  a  mix-and-match  approach  by  providing  the 
Globus  metacomputing  toolkit  from  which  users  and  developers  can  select  to  meet  their  need. 
The  toolkit,  embodied  core  technologies  for  computational  grid,  covers  areas  of  resource  man¬ 
agement,  security,  communication,  and  data  access. 

Globus: 

http://www.globus.org 

An  interesting  defense  application  using  the  Globus  toolkit  is  the  DARPA-funded  Synthetic  Forces 
(SF)  Express  project  at  Caltech  that  has  recently  set  a  record  in  Distributed  Interactive  Simula¬ 
tion  (DIS). 

SF-Express: 

http://www.cacr.caltech.edu/sfexpress/ 

The  well-known  netlih  depository  of  freely  available  numerical  software  now  offers  the  NetSolve 
system  to  provide  user  access  to  preinstalled  numerical  libraries  that  allow  execution  using  re¬ 
mote  computing  resources  of  a  wide-area  virtual  computer. 

NetSolve: 

http://www.netlib.org/netsolve/ 

The  agent-based  Application  Level  Scheduler  (AppLeS)  and  the  network  and  resource  monitor  Net¬ 
work  Weather  Service  (NWS)  have  a  homepage  at 

AppLeS: 

i  http://www-cs.ucsd.edu/groups/hpcl/apples/apples.html 
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